Thanks to Adam Gugliciello of Datameer for the talk at TriHUG last week! Here are the slides:
Title: Financial Data Analytics with Hadoop
Sponsored By:
Abstract:
Hadoop based applications are becoming critical in the financial services arena for the analysis and correlation of large volumes of structured and unstructured data. In addition, the Dodd-Frank Act signifies the largest US financial regulatory change in several decades and requires much greater transparency on financial data. In this session, we will answer common questions and demonstrate use cases in how Hadoop and Datameer help with asset management and risk management, fraud detection and data security.
Leave this session knowing about:
Bio: Adam Gugliciello, a 15-year veteran in Software Engineering and Systems Architecture specializes in highly available, parallel systems. Most recently he has been developing grid computing solutions to enable deep analyses and intelligence gathering on huge software systems for technical debt and functional mapping. Adam is a Solution Engineer at Datameer and helps bring Financial and Telco applications expertise to the utilization of the Datameer business intelligence suite.
Title: Intro to Apache HBase by Chris Shain of Tresata
Location: Bronto Software in Durham, NC
Abstract: Chris will provide an introduction to Apache HBase, aiming to discuss:
Bio: Chris Shain is the software development lead at Tresata, a provider of Big Data solutions for the financial industry in Charlotte NC. His background includes 7+ years of software development experience in the financial services industry, with a focus on customer-facing data management applications and data warehousing. Lately he works with Hadoop and HBase on data volumes in the multi-terabyte range, and tinkers with geographic information systems. He lives in Charlotte NC, and can be reached at chris@tresata.com or twitter @ChrisShain.
Thanks to Alan Gates of Hortonworks for the two excellent presentations on Apache Pig and Apache HCatalog. Links to the slides for the two talks are included below and are also available on Slideshare.
OSCON Data 2011 - Lumberyard View more presentations from Josh Patterson
Our next meeting will be November 15 at Bronto Software. The speaker will be Alan Gates, the author of Programming Pig and a member of the Hortonworks team. RSVP here.
————-
Title: New Features in Pig 0.9 and Introducing HCatalog
Abstract: Pig 0.9 added several features to make Pig a more powerful data processing platform, including macros, include statements, and the ability to embed Pig in Python for control flow. We’ll cover these, talk about some new features that have been added since 0.9, and what’s next on Pig’s roadmap.
HCatalog is a table management and storage management layer for Hadoop that enables users with different data processing tools – Pig, MapReduce, Hive, Streaming – to more easily read and write data on the grid. HCatalog’s table abstraction presents users with a relational view of data in the Hadoop distributed file system (HDFS) and ensures that users need not worry about where or in what format their data is stored – RCFile format, text files, sequence files. This talk will include an overview of HCatalog’s features and a discussion of its current roadmap.
Bio: Alan is a co-founder of Hortonworks as well as an original member of the engineering team that took Pig from a Yahoo! Labs research project to a successful Apache open source project. Alan also designed HCatalog and guided its adoption as an Apache Incubator project. Alan has a BS in Mathematics from Oregon State University and a MA in Theology from Fuller Theological Seminary. He is also the author of Programming Pig, a forthcoming book from O’Reilly Press. Follow Alan on Twitter: @alanfgates.
The next Triangle Hadoop User Group meeting will be October 11th at Bronto Software and will be featuring Josh Patterson of Cloudera. RSVP here.
Title: Lumberyard: Time series Indexing at Scale Abstract: As time series data explodes in volume in the genomic, sensor, and financial realms [1] companies are looking for more effective ways to store and query this data. To handle this explosion in scale systems are looking to the Hadoop, HBase, and NoSQL domain for components to build their systems on. In this talk we introduce Lumberyard [3], a system which can potentially (1) store Terabytes of time series data and allow for this data to be interactively queried at low latencies to provide real time access. Lumberyard stores iSAX [4] indexes in HBase’s Multi-dimensional sorted map storage system which give Lumberyard the reliability of HDFS yet the low latencies of HBase. Our approach leverages a multidimensional indexing structure which is stored in HBase’s highly available distributed multi-dimensional sorted map. We present the design of Lumberyard’s implementation and illustrate the differences between an in-memory iSAX index compared with a persisted HBase-backed iSAX index. Sponsored by Cloudera and Bronto Software. More info at www.trihug.org. Bio: Master’s Thesis: self-organizing mesh networks Published in IAAI-09: TinyTermite: A Secure Routing Algorithm Conceived, built, and led Hadoop integration for the openPDC project at TVA (Smartgrid stuff). Led small team which designed classification techniques for timeseries and Map Reduce. Open source work at Now: Sr. Solutions Architect at Cloudera
Thanks to everyone for attending last night’s talk! Ted’s slides are available for download below.
MapR, Implications for Integration View more presentations from trihug
Under the better late than never category, here are the slides from the April 2011 TriHUG meeting on Starfish.
Starfish: A Self-tuning System for Big Data Analytics View more presentations from gsingers.