The meetup kicked off with YARN committers from Yahoo presenting on current Hadoop 2.0 deployments at Yahoo. As part of the presentation, the following were covered.
Chris Riccomini from LinkedIn then presented about his experience in “Building Applications on YARN”. He briefly covered the anatomy of a YARN application and then jumped into various dimensions a YARN application developer should think about – deployment, metrics, logging, application specific configuration to name a few.
The most interesting bits about his presentation include how, pre-production, small instances of YARN clusters can be utilized to develop applications in an agile manner. For example, one could start with using local file system and avoiding HDFS to minimize the operational effort, and then switch over to a full-blown distributed file system when the desire for scalability crosses a threshold. Also worth attention is how YARN’s web-service APIs can be exploited to build custom dashboards.
After that, Arun recapped the YARN’s powerful scheduling API available to the application developers for using the cluster resources. He walked us through the scheduling concepts, and rounded it up with how scheduling happens in the context of an example MapReduce job.
Bikas and I then proceeded to give a brief overview of what all APIs are available to application developers. We described some of the pain points with the APIs that various users indicated in the recent past and efforts underway to address some of them. To enumerate a few:
We opened the API discussion for further feedback. This exercise was very fulfilling. We discovered how various users were experimenting with the APIs and what pitfalls and limitations they ran into. Some concrete suggestions include:
Our slides are available here.
After a short break, Alejandro Abdelnur from Cloudera briefly talked about the efforts underway to augment YARN with cpu-isolation using cgroups.
Finally, Siddarth Seth from Hortonworks talked about his work on modifying MR application to reuse containers for jobs both large and small. This exciting development opens new innovations in the MapReduce land like intermediate output aggregation. You can read through Sid’s presentation below. The core points covered are:
His slides are available here.
The success of this meetup reaffirmed the excitement of the community about YARN. This also strengthened our desire to make it a recurring event. We look forward to the next one, with hopefully more turnout, extended brainstorming, and of course, more pizza and beer 🙂