Takeaways from OSCON 2011

For the first time in its history, OSCON, the premier open-source conference, had a special OSCON Data sub-conference. Apache Hadoop had a full track dedicated to it at OSCON Data. This clearly was indicative of the interest in Big Data and the central role Apache Hadoop plays in the space. A special shout out to Bradford Stephens and Sarah Novotny, the program chairs, who did a fantastic job with OSCON Data.

Hortonworks was well represented at OSCON Data 2011. Owen O’Malley and I presented talks and Alan Gates took a short break from his vacation to stop-by.

Owen presented a very interesting talk on ‘Developing and Deploying Hadoop Security’. The presentation covered the goals of Hadoop Security and how to use the new features to ensure the security of their HDFS and MapReduce clusters. Owen also talked about Yahoo’s experiences deploying the back-ported Hadoop Security features on their science and production clusters. He also covered details on the several man-years of effort which went into developing the comprehensive and well-integrated security work the Hortonworks (formerly at Yahoo!) team spent.

I presented a talk on ‘Next Generation Apache Hadoop MapReduce’. The talk covered the details on how the Apache Hadoop MapReduce framework has hit a scalability limit around 4,000 machines. We are developing the next generation of Apache Hadoop MapReduce that factors the framework into a generic resource scheduler and a per-job, user-defined component that manages the application execution. Since downtime is more expensive at scale, high-availability is built-in from the beginning; as are security and multi-tenancy to support many users on the larger clusters. The new architecture will also increase innovation, agility and hardware utilization.

I also had great fun attending various talks such as OpenTSDB by Benoit Sigoure, which is a very interesting usage of HBase as a backend for time-series database, and Theory of Caching by Greg Luck. My personal highlight was the coming out party of Java JDK7 and more details on plans for JDK8 by Joe Darcy.

Overall it was a fantastic opportunity to meet folks and share ideas.

— Arun C. Murthy

Categorized by :
Hadoop Hadoop Ecosystem Industry Happenings


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.