Slides from Feb 16. talk by Adam Gugliciello of Datameer
Thanks to Adam Gugliciello of Datameer for the talk at TriHUG last week! Here are the slides:
Thanks to Adam Gugliciello of Datameer for the talk at TriHUG last week! Here are the slides:
Some TriHUG members are in the early stage of putting together an all day hackathon on all things scaling (Hadoop, Cassandra, Hive, Pig, Mahout, etc.) and wanted to get some info out to the community as well as a call for volunteers and sponsors.
The basic gist of the day is that we get together and spend the day hacking and learning about writing scalable, fault tolerant systems. All ranges of experience are welcome and we fully expect that one of the groups that forms will be a “tutorial” group, while other groups will be doing more advanced things. The key is to get lots of interaction and cross-fertilization of ideas.
Our tentative plan is that we will make available:
1. Compute Cluster time (likely Amazon EC2) along with ready to use instances w/ appropriate things already installed. (More later)
2. Some public data sets, but feel free to bring your own publicly available on Amazon S3
3. Food, drinks, etc. including pizza/beer at the end
4. Network connectivity
5. Space to work in
6. (TBD) Machine to submit jobs using a fair scheduler
You need to bring your laptop and an open mind. Also having your favorite tools on your machine would be good. A github account or something similar would also be useful.
Our likely date for this is June 18th with a backup date of June 25 (pending space availability) from 9 AM - 6 (?) PM. Attendance will require RSVP and we will send out sign up info later. For now, we are targeting it to be free (including EC2 compute time), but that is predicated on us getting sponsorships to cover costs, so if you think you or your company can sponsor, please let us know ASAP.
Tentative Schedule (strawman):
8:30: Doors open/networking/coffee/snacks
9 - 9:30: Idea pitches and Seed Projects announced and teams formed — people can stand up and say what they are interested in and then we imagine people can team up based on their interest — for instance, I will probably work on Mahout and machine learning
9:30 - 12: Hack
12-1: Food/networking/hacking
1-5(?): Hack
5-6 (no firm cut off time): Share what you learned to the group over pizza and drinks. Demo if you have one.
How you can help:
- Help us get data sets organized and a Chef/Puppet recipe setup with all the appropriate tools/languages/SCM/etc. Also, think of interesting problems to work on.
- Sponsor food/coffee/drinks/t-shirts/compute time/ etc. Please contact Grant Ingersoll at info@trihug.org. I don’t think we are talking about a super lot of money here (maybe $1000-1500 total? — more on this as things develop)
- Let us know you are interested, the more we hear from sooner, the better we can plan space accordingly. Please reply on this list if you are interested.
- Are you graphically capable? Help us design a t-shirt.
- Once we firm up some details, help us spread the word

We closed out the year with a talk by Brian O’Connor ( seen above at the MacBook Air ). He outlined how UNC Lineberger Comprehensive Cancer Center is using Hadoop and HBase in research. Once again Bronto graciously hosted us and provided food and drinks. Slides for Brian’s talk and the New and Noteworthy segment are below.
2010 was a good year for Hadoop in the Triangle. Looking forward to 2011.
-ryan
Speakers for the next meeting are lined up. We will meet at Bronto Software, Tuesday September 14th at 6:30pm. Details in the invite link below.
Where it all began: Using Apache Hadoop for Search with Apache Lucene and Solr
Apache Hadoop was originally created by Doug Cutting and Mike Cafarrela as part of the Apache Nutch project’s need for large scale crawling and indexing. In this talk, Lucene/Solr committer Grant Ingersoll will show how to use Hadoop to build a search index in Apache Lucene and Solr
Practical Hadoop Security
NCSU PhD candidate Wei Wei explores attack vectors unique to distributed computing along with mitigation techniques. He will present lessons learned from his work on SecureMR: a practical service integrity assurance framework for MapReduce
Looking ahead to October, we are aiming for a biotech / pharma theme. So far we have Brian O’Connor from UNC lined up to talk about how they are using HBase in cancer research. If you want to fill the second speaker slot, head on over to the mailing list and let us know.
-ryan