Triangle Hadoop Users Group
5 months ago

Next Meeting: Sept. 13 @ Bronto Software

We trust everyone has had a good summer and is equally excited to get back into learning more about Apache Hadoop and scaling.  Our next meeting will be Sept. 13 at Bronto Software.  Food and drinks start at 6:30 and the talks start at 7.

We are pleased to announce that our speaker will be Ted Dunning from MapR Technologies.   See below for more details.  Please RSVP here.

Title: MapR, Architecture and Implications

Abstract:

The talk will be a description of how MapR’s architectural advances allow significant improvements in speed, reliability and scalability over stock Hadoop.  This will include a dive into the MapR file system and a discussion of how the map-reduce layer has been changed and the impact on other Hadoop eco-system components.  This will include actual test results.

In the second section of my talk, I will describe how this new architecture has surprising consequences.  In particular, I will show how tasks like machine learning, data visualization and search indexing can all work better on the MapR platform.

Ted’s Bio:

Ted has held Chief Scientist positions at Veoh Networks, ID Analytics and at MusicMatch, (now Yahoo Music). Ted is responsible for building the most advanced identity theft detection system on the planet, as well as one of the largest peer-assisted video distribution systems and ground-breaking music and video recommendations systems. Ted has 15 issued and 15 pending patents and contributes to several Apache open source projects including Hadoop, Zookeeper and Hbase. He is also a committer for Apache Mahout. Ted earned a BS degree in electrical engineering from the University of Colorado; a MS degree in computer science from New Mexico State University; and a Ph.D. in computing science from Sheffield University in the United Kingdom. Ted also bought the drinks at one of the very first Hadoop User Group meetings.

9 months ago

RTP Scaling Hackathon (Planning Stages)

Some TriHUG members are in the early stage of putting together an all day hackathon on all things scaling (Hadoop, Cassandra, Hive, Pig, Mahout, etc.) and wanted to get some info out to the community as well as a call for volunteers and sponsors.

The basic gist of the day is that we get together and spend the day hacking and learning about writing scalable, fault tolerant systems.  All ranges of experience are welcome and we fully expect that one of the groups that forms will be a “tutorial” group, while other groups will be doing more advanced things.  The key is to get lots of interaction and cross-fertilization of ideas.

Our tentative plan is that we will make available:

1. Compute Cluster time (likely Amazon EC2) along with ready to use instances w/ appropriate things already installed.  (More later)
2. Some public data sets, but feel free to bring your own publicly available on Amazon S3 
3. Food, drinks, etc. including pizza/beer at the end
4. Network connectivity
5. Space to work in
6. (TBD) Machine to submit jobs using a fair scheduler

You need to bring your laptop and an open mind.  Also having your favorite tools on your machine would be good.  A github account or something similar would also be useful.

Our likely date for this is June 18th with a backup date of June 25 (pending space availability) from 9 AM - 6 (?) PM.    Attendance will require RSVP and we will send out sign up info later.  For now, we are targeting it to be free (including EC2 compute time), but that is predicated on us getting sponsorships to cover costs, so if you think you or your company can sponsor, please let us know ASAP.

Tentative Schedule (strawman):  
8:30:  Doors open/networking/coffee/snacks
9 - 9:30: Idea pitches and Seed Projects announced and teams formed —  people can stand up and say what they are interested in and then we imagine people can team up based on their interest — for instance, I will probably work on Mahout and machine learning
9:30 - 12: Hack
12-1: Food/networking/hacking
1-5(?): Hack
5-6 (no firm cut off time): Share what you learned to the group over pizza and drinks.  Demo if you have one.  

How you can help:

- Help us get data sets organized and a Chef/Puppet recipe setup with all the appropriate tools/languages/SCM/etc.  Also, think of interesting problems to work on.
- Sponsor food/coffee/drinks/t-shirts/compute time/ etc.  Please contact Grant Ingersoll at info@trihug.org.  I don’t think we are talking about a super lot of money here (maybe $1000-1500 total? — more on this as things develop)
- Let us know you are interested, the more we hear from sooner, the better we can plan space accordingly.  Please reply on this list if you are interested.
- Are you graphically capable?  Help us design a t-shirt.
- Once we firm up some details, help us spread the word

11 months ago

Next TriHUG: Monday April 4th

Out next meeting will be Monday, April 4th at Bronto. Food and drinks at 6:30pm. Talk starts at 7:00pm.

*** Note: this is a Monday, not our usual Tuesday ***


Title: Starfish: A Self-tuning System for Big Data Analytics

Presenter: Shivnath Babu, Duke University

Shivnath Babu, assistant professor of Computer Science at Duke University, will help demystify Hadoop performance tuning. Practical tips on tuning Hadoop for specific workloads will be discussed. Details will be provided on the the Starfish research project: a system for self-tuning big-data analytics. 

RSVP HERE

1 year ago

February 2011 Meeting - Apache Mahout: Driving the Yellow Elephant

Out next meeting will be Tuesday February 1st at Bronto. Food and drinks at 6:30pm. Talk starts at 7:00pm.

Apache Mahout: Driving the Yellow Elephant

Apache Mahout co-founder and committer, Grant Ingersoll will give an introduction to Apache Mahout and machine learning.  We will also spend some time looking at how Mahout leverages Apache Hadoop to implement a scalable clustering algorithm.

REGISTER HERE

Help spread the word! Print out this flyer and post it at your office, school, local coffee shop.


1 year ago

December 2010 Meeting Followup

We closed out the year with a talk by Brian O’Connor ( seen above at the MacBook Air ). He outlined how UNC Lineberger Comprehensive Cancer Center is using Hadoop and HBase in research. Once again Bronto graciously hosted us and provided food and drinks. Slides for Brian’s talk and the New and Noteworthy segment are below. 

2010 was a good year for Hadoop in the Triangle. Looking forward to 2011.

-ryan

1 year ago

December TriHUG Meeting


We will be talking HBase and biotech. Come join us at 6:30pm on Tuesday December 7th at Bronto. Food and drinks at 6:30pm. Talk starts at 7:00pm.

Brian O’Connor from the UNC Lineberger Comprehensive Cancer Center will discuss their work using HBase and Hadoop MapReduce to store and query information from large cancer resequencing projects.  He will provide an overview of HBase along with the problems they are working on. The relative merits of the technology will be explored in addition to alternative approaches. 


REGISTER HERE


More info:

Help us spread the word on Twitter and Facebook or by printing out this flyer and posting it at your university or office.

    1 year ago

    September 2010 Meeting Followup

    Another fantastic meeting. Again, it was a diverse mix from startups, big companies and academia; many of which are doing some *really* interesting stuff. Thanks again to our sponsor Lucid Imagination for the food and drinks and thanks to Bronto for the great facilities.

    We have one speaker slot available for our next meeting. Interested? Hit us up @trihug.

    -ryan

    1 year ago

    September TriHUG Meeting

    Speakers for the next meeting are lined up. We will meet at Bronto Software, Tuesday September 14th at 6:30pm. Details in the invite link below.


    REGISTER HERE


    Where it all began: Using Apache Hadoop for Search with Apache Lucene and Solr

    Grant Ingersoll

    Apache Hadoop was originally created by Doug Cutting and Mike Cafarrela as part of the Apache Nutch project’s need for large scale crawling and indexing.  In this talk, Lucene/Solr committer Grant Ingersoll will show how to use Hadoop to build a search index in Apache Lucene and Solr 


    Practical Hadoop Security

    Wei Wei

    NCSU PhD candidate Wei Wei explores attack vectors unique to distributed computing along with mitigation techniques. He will present lessons learned from his work on SecureMR: a practical service integrity assurance framework for MapReduce




    Looking ahead to October, we are aiming for a biotech / pharma theme. So far we have Brian O’Connor from UNC lined up to talk about how they are using HBase in cancer research. If you want to fill the second speaker slot, head on over to the mailing list and let us know.


    -ryan


    1 year ago

    July Meeting Follow-up

    Our first meeting was a great success with somewhere around 25 atendees. It turns out that we have people in the Triangle using Hadoop for cancer research, quantitative finance, web analytics and a variety of other applications. Grant was furiously taking notes as ideas for future meetings were discussed. Look for announcements here on the blog, twitter and the mailing list. Thanks again to the other presenters, Jeff and Chad as well as Bronto for the facilities and jStart for the beer and pizza.

    -ryan

    1 year ago

    July TriHUG Meeting

    Interested in learning more about Hadoop or finding out what’s going on with Hadoop in the Triangle?

    Come out for the first meeting of the Triangle Hadoop Users Group scheduled for 7pm on Tuesday July 20th at Bronto Software.

    Register HERE

    Three presenters are lined up:

    Join the mailing list and follow us @trihug.