Four years ago, Arun Murthy entered a JIRA ticket (MAPREDUCE -279) that outlined a re-architecture of the original MapReduce. In the ticket, he outlined a set of capabilities that allowed processes to better share resources and an architecture that would allow Hadoop to extend beyond batch data processing.
It turned out that this ticket was prescient of true enterprise requirements for Hadoop. As enterprise adoption accelerated, it became even clearer that multiple processing models – moving beyond batch – was critical for Hadoop to broaden its applicability for mainstream usage in the modern enterprise architecture. The common pattern: enterprises want to store data in HDFS and then access it in a variety of ways, simultaneously, and with a consistent level of service. It must support a range of interaction patterns, from batch to streaming to MPI and more.
This JIRA ticket ultimately resulted in a new branch of the open source Apache code trunk (Hadoop 2.0) and a new sub-project, Apache Hadoop YARN.
We’ve posted a series of blogs on the technical aspects of YARN, but in simplest terms YARN separates out the resource management capabilities previously in MapReduce, and thereby provides a framework to introduce a whole new range of new processing engines. A simple graphical depiction is below, and shows that the YARN based architecture of Hadoop 2.x is fundamentally different from the architecture of Hadoop 1.x.
With Hadoop 2.0 working its way through the community process at the Apache Software Foundation and soon to be released as Beta, today we are excited to make two significant announcements:
The HDP 2.0 Community Preview is the first delivery to include YARN and will enable us to engage an ecosystem of partners to progress this new technology in the coming weeks and months to ensure it is ready for mainstream usage.
We have already seen several announcements of YARN (expand these) enablement from the community – including STORM / YARN from Yahoo! and Weave from Continuity – and anticipate that the Hortonworks Certification Program for Apache Hadoop YARN will further accelerate the types of applications that will be able to run natively in Hadoop.
MOST of all, we are excited about the ecosystem of applications that will result from this program and are proud to announce over 15 partners who have already joined, including, Altiscale, Concurrent, Continuuity, DataTorrent, Elasticsearch, Karmasphere, Microsoft, MicroStrategy, Platfora, Red Hat, SAS, Splunk, Sqrrl, Tableau Software and TIBCO.
Today, HDP2.0 CP is available for download as a single-node VM and we will release a full preview distribution within the next week. The VM package, available for download from our website in our Sandbox form factor also includes two tutorials — one on Apache Tez and one on YARN. And you will also see the essential tutorials from Hortonworks Sandbox 1.3 to enable you to see the Stinger initiative in action.