Triangle Hadoop Users Group
1 month ago

Slides From Last Night’s Talk on Spark and Shark

Slides from last night’s talk on Spark and Shark have been posted.

Also, note, we are now using Meetup for announcing upcoming events in addition to our other channels.  See http://www.meetup.com/TriHUG/ for more info.

TriHUG talk on Spark and Shark from trihug
3 months ago

TriHUG February: Real-Time Scalable Data Applications on HBase with Kiji

Build Real-Time Scalable Data Applications on Apache HBase with Kiji

Speaker: Aaron Kimball, WibiData
Date: February 19th, 6:30PM
Location: Bronto Software, Durham, NC

 

Large-scale data has challenges at every phase in its lifecycle: capture, storage, processing, and result serving. Depending on the nature of the data, and the analysis goals of a data team, imposing the right schema on a NoSQL storage system such as Apache HBase can enable more efficient storage, retrieval, and analysis of relevant information as well as increase the maintainability of such a system moving forward.

 

In this talk, we present the Kiji framework for building real-time scalable data applications on Apache HBase. Kiji is a collection of Apache2-licensed open source components that extend the Hadoop ecosystem and help developers with schema management, MapReduce processing, and data integration tasks.

To download or learn more about Kiji, visit www.kiji.org.

 

About the speaker:

Aaron founded WibiData in 2010. He has worked with Hadoop since 2007 and is a committer on the Apache Hadoop project. In addition, Aaron founded the Apache Sqoop data import tool and Apache MRUnit Hadoop testing library projects. Previously he was the first engineer hired by Cloudera, the leading provider of Apache Hadoop-based software and services. Aaron holds a B.S. in Computer Science from Cornell University and a M.S. in Computer Science from the University of Washington. When not thinking about Hadoop, Aaron is an avid sailor, Burning Man devotee, and player of board and video games.

3 months ago

Slides from Cloudera Impala talk on Jan. 29, 2013

Slides from Ricky Saltzer’s talk on Jan. 29 are now posted:

Impala presentation from trihug
4 months ago

TriHUG January: Cloudera Impala

Cloudera Impala: Real-time queries with Apache Hadoop

Speaker: Ricky Saltzer, Cloudera

Date: January 29, 6:30PM

Location: Bronto Software, Durham, NC


Join us for this technical deep dive about Cloudera Impala, the project that makes scalable parallel databse technology available to the Hadoop community for the first time. Impala is an open-sourced code base that allows users to issue low-latency queries to data stored in HDFS and Apache HBase using familiar SQL operators.


RSVP here: http://trihug-01-2013.eventbrite.com/

6 months ago

TriHUG November: Beyond Batch - HBase, Drill and Storm

Brad Anderson of MapR will be taking us “Beyond Batch” with HBase, Drill, and Storm. We’ll also get a look at MapR’s latest release, M7.

Date: November 8th, 6:30PM

Sponsors: Bronto, LucidWorks, MapR, Zaloni

Location: Bronto Software (American Tobacco Campus, Durham NC)


RSVP here: http://trihug-11-2012.eventbrite.com/

8 months ago

TriHUG September: Large Scale Search, Discovery and Analytics in Action

For September, Grant Ingersoll of LucidWorks will be talking about “Large Scale Search, Discovery and Analytics in Action.”  

In this talk, you’ll learn how a platform enables large scale search, discovery and analytics over a wide variety of content, utilizing tools like Solr, Hadoop, Mahout and others.

Date: September 18th, 6:30PM
Location: Bronto Software (American Tobacco Campus, Durham NC)


RSVP here: http://trihug-09-2012.eventbrite.com/

9 months ago

TriHUG August: Intro to Hive

After a summer break, TriHUG is back with a lineup of great speakers this fall, starting on Tuesday, August 14th, with Ricky Saltzer from Cloudera!

Ricky will be talking about Hive, a data warehouse system for Hadoop that uses a SQL like interface for ad-hoc querying.

RSVP here: http://trihug-08-2012.eventbrite.com/


Topic: Intro to Hive

Speaker: Ricky Saltzer, Cloudera

Date: August 14th, 6:30PM

Location: Bronto Software, Durham, NC

12 months ago

Slides from May 22nd talk by David Arthur

Thanks to David Arthur of Lucid Imagination for his presentation on Apache ZooKeeper!

1 year ago

Intro to Apache Zookeeper

ZooKeeper - An Introduction and Practical Use Cases”

Speaker: David Arthur, Lucid Imagination

Date: May 22nd, 6:30PM

Location: Bronto Software, Durham, NC

RSVP: http://trihug-05-2012.eventbrite.com/

Abstract: ZooKeeper is a distributed coordination service with a strong emphasis on update consistency. It can be used for simple things like configuration management and distributed id assignment, as well as more complex things like distributed locking and service discovery. The API provided by ZooKeeper is rather low-level and requires a bit of boilerplate code, so we will also look at some higher level frameworks.

David Arthur is a software engineer at Lucid Imagination working on the new “Big Data” team. He has been working for the last two years building distributed systems where Hadoop has been a central component. Prior to working in the “big data” space, he focused mainly on data side of applications: schema optimization, data warehousing, etc. He attended Florida State University where he received a B.S. in Physics and completed two years of graduate studies in Scientific Computing.

1 year ago

Next Meeting: IBM Watson on April 12, 2012 @ Bronto Software

Title: IBM Watson: Big Data Text Analytics

LocationBronto Software in Durham, NC

Speaker: John Gerken

RSVPhttp://trihug-04-2012.eventbrite.com/

Abstract: What is IBM Watson? It managed to defeat two previously undefeated human opponents in a game show and it has shown how super computers are becoming more and more able to understand and answer questions in ways previously reserved for the domain of human thought.  But what most don’t realize is how the synergy between Big Data and text analytics was integral to enabling IBM Watson’s capabilities. Curious as to how Watson leveraged Big Data and text analytics technologies? Wondering what the future may hold for applying them?  If so, attend TriHUG on April 12 to find out.  

Speaker Bio: John Gerken is a Senior Software Architect in IBM’s Emerging Technologies jStart Team, where he is responsible for recognizing, promoting and developing prototypes of software technologies and trends that could positively impact IBM’s customers. John is an IBM Watson Team Leader working to enable Watson to be used by customers.  He is also a recognized thought leader in the area of Situational Applications and mashup ecosystems and is a principle evangelist for these technologies. John is a member of the North Carolina Technical Experts Council (NC TEC), which is an IBM Academy affiliated technical advisory and vitality organization serving the RTP, NC area.  He also holds a Bachelors of Science in Jazz Performance and plays at every opportunity. 


SponsorsBronto SoftwareLucid Imagination