Triangle Hadoop Users Group

Feb 22

Slides from Feb 16. talk by Adam Gugliciello of Datameer

Thanks to Adam Gugliciello of Datameer for the talk at TriHUG last week!  Here are the slides:

Financial services trihug
View more presentations from trihug.

Feb 01

Next Meeting: Feb. 16 @ Bronto Software

Title: Financial Data Analytics with Hadoop

Sponsored By: 

Datameer

RSVP here

Abstract: 

Hadoop based applications are becoming critical in the financial services arena for the analysis and correlation of large volumes of structured and unstructured data.  In addition, the Dodd-Frank Act signifies the largest US financial regulatory change in several decades and requires much greater transparency on financial data.  In this session, we will answer common questions and demonstrate use cases in how Hadoop and Datameer help with asset management and risk management, fraud detection and data security.   

Leave this session knowing about:

Bio: Adam Gugliciello, a 15-year veteran in Software Engineering and Systems Architecture specializes in highly available, parallel systems. Most recently he has been developing grid computing solutions to enable deep analyses and intelligence gathering on huge software systems for technical debt and functional mapping. Adam is a Solution Engineer at Datameer and helps bring Financial and Telco applications expertise to the utilization of the Datameer business intelligence suite.

Jan 18

Slides from Intro to HBase presentation January 2012

Thanks to Chris Shain from Tresata for coming to Durham last night to talk about HBase.



TriHUG January 2012 Talk by Chris Shain

Jan 08

Next Meeting: January 17, 2012 @ Bronto Software

Title: Intro to Apache HBase by Chris Shain of Tresata

Location: Bronto Software in Durham, NC

RSVP

Abstract: Chris will provide an introduction to Apache HBase, aiming to discuss:

  1. What is HBase? (High level overview)
  2. Details of the HBase architecture
  3. How do clients interact with HBase?
  4. Some general HBase patterns and anti-patterns
  5. What are the use cases for HBase vs. Relational DB?

Bio: Chris Shain is the software development lead at Tresata, a provider of Big Data solutions for the financial industry in Charlotte NC. His background includes 7+ years of software development experience in the financial services industry, with a focus on customer-facing data management applications and data warehousing. Lately he works with Hadoop and HBase on data volumes in the multi-terabyte range, and tinkers with geographic information systems. He lives in Charlotte NC, and can be reached at chris@tresata.com or twitter @ChrisShain.

Nov 18

Slides from Alan Gates Presentation on Nov. 15, 2011

Thanks to Alan Gates of Hortonworks for the two excellent presentations on Apache Pig and Apache HCatalog. Links to the slides for the two talks are included below and are also available on Slideshare.

TriHUG November Pig Talk by Alan Gates
View more presentations from trihug.
TriHUG November HCatalog Talk by Alan Gates
View more presentations from trihug.

Oct 12

Slides from Oct. 11 TriHUG meeting featuring Josh Patterson of Cloudera

OSCON Data 2011 - Lumberyard View more presentations from Josh Patterson

Next Meeting: November 15, 2011 @ Bronto Software

Our next meeting will be November 15 at Bronto Software.  The speaker will be Alan Gates, the author of Programming Pig and a member of the Hortonworks team.  RSVP here.

————-

Title:  New Features in Pig 0.9 and  Introducing HCatalog

Abstract:  Pig 0.9 added several features to make Pig a more powerful data processing platform, including macros, include statements, and the ability to embed Pig in Python for control flow.  We’ll cover these, talk about some new features that have been added since 0.9, and what’s next on Pig’s roadmap.

HCatalog is a table management and storage management layer for Hadoop that enables users with different data processing tools – Pig, MapReduce, Hive, Streaming – to more easily read and write data on the grid. HCatalog’s table abstraction presents users with a relational view of data in the Hadoop distributed file system (HDFS) and ensures that users need not worry about where or in what format their data is stored – RCFile format, text files, sequence files.  This talk will include an overview of HCatalog’s features and a discussion of its current roadmap.

Bio:  Alan is a co-founder of Hortonworks as well as an original member of the engineering team that took Pig from a Yahoo! Labs research project to a successful Apache open source project. Alan also designed HCatalog and guided its adoption as an Apache Incubator project. Alan has a BS in Mathematics from Oregon State University and a MA in Theology from Fuller Theological Seminary. He is also the author of Programming Pig, a forthcoming book from O’Reilly Press. Follow Alan on Twitter: @alanfgates.

Sep 14

TriHUG Next Meeting featuring Josh Patterson of Cloudera set for Oct. 11

The next Triangle Hadoop User Group meeting will be October 11th at Bronto Software and will be featuring Josh Patterson of Cloudera.  RSVP here.

Title: Lumberyard: Time series Indexing at Scale

Abstract: 

As time series data explodes in volume in the genomic, sensor, and

financial realms [1] companies are looking for more effective ways to

store and query this data. To handle this explosion in scale systems

are looking to the Hadoop, HBase, and NoSQL domain for components to

build their systems on. In this talk we introduce Lumberyard [3], a

system which can potentially (1) store Terabytes of time series data

and allow for this data to be interactively queried at low latencies

to provide real time access. Lumberyard stores iSAX [4] indexes in

HBase’s Multi-dimensional sorted map storage system which give

Lumberyard the reliability of HDFS yet the low latencies of HBase. Our

approach leverages a multidimensional indexing structure which is

stored in HBase’s highly available distributed multi-dimensional

sorted map. We present the design of Lumberyard’s implementation and

illustrate the differences between an in-memory iSAX index compared

with a persisted HBase-backed iSAX index.

Sponsored by Cloudera and Bronto Software.

More info at www.trihug.org.

Bio:

Master’s Thesis: self-organizing mesh networks Published in IAAI-09:

TinyTermite: A Secure Routing Algorithm

Conceived, built, and led Hadoop integration for the openPDC project

at TVA (Smartgrid stuff). Led small team which designed classification

techniques for timeseries and Map Reduce. Open source work at

http://openpdc.codeplex.com

Now: Sr. Solutions Architect at Cloudera

Slides from Ted Dunning’s Sept. 2011 talk

Thanks to everyone for attending last night’s talk!  Ted’s slides are available for download below.

MapR, Implications for Integration View more presentations from trihug

Sep 09

Starfish Talk Slides from April 2011

Under the better late than never category, here are the slides from the April 2011 TriHUG meeting on Starfish.

Starfish: A Self-tuning System for Big Data Analytics View more presentations from gsingers.