HDP 2.0 and its YARN-based architecture…delivered!
Typical delivery of enterprise software involves a very controlled date with a secret roadmap designed to wow prospects, customers, press and analysts…or at least that is the way it usually works. Open source, however, changes this equation.
As described here, the vision for extending Hadoop beyond its batch-only roots in support of interactive and real-time workloads was set by Arun Murthy back in 2008. The initiation of YARN, the key technology for enabling this vision, started in earnest in 2011, was declared GA by the community in the recent Apache Hadoop 2.2 release, and is now delivered for mainstream enterprises and the broader commercial ecosystem with the release of Hortonworks Data Platform 2.0.
HDP 2.0, and its YARN foundation, is a huge milestone for the Hadoop market since it unlocks that vision of gathering all data in Hadoop and interacting with that data in many ways and with predictable performance levels… but you know this because Apache Hadoop 2.2 went GA last week.
So, what else is delivered in HDP 2.0?
Just as the vision for YARN was publicly described and then delivered, the same can be said for Phase 2 of the Stinger Initiative. In February of 2013 we described 3 phases of investment aimed at improving Apache Hive’s Speed, Scale, and SQL, and the release of Apache Hive 0.12 included in HDP 2.0 delivers on Phase 2 of that vision. Apache HBase 0.96.0 is also in HDP 2.0 and is the culmination of more than a year’s worth of effort that’s delivered important enterprise features such as Snapshots and improved MTTR; read more here. But wait…there’s more! You can also read about the latest Apache Pig 0.12 here and Apache Ambari 1.4.1 here.
The point is that there are LOTS of new features and capabilities delivered in HDP 2.0, and since a picture is worth a thousand words, let’s look at what I fondly refer to as the “asparagus chart” to highlight the fact that HDP 2.0 contains the latest stable innovations developed within the Apache community, all nicely integrated, tested and made ready for the enterprise in HDP 2.0.
Our Model: Working within the Community for the Enterprise
The chart above also helps illustrate our 100% open source model. All of those components, including the management and monitoring a la Apache Ambari, are all Apache open source projects and integrated into HDP; making HDP the most complete 100% open source enterprise Hadoop product available. And if you look at the version numbers, you will notice that HDP 2.0 includes the very latest releases of the components you rely on.
How are we able to deliver the latest innovation from the community within HDP? Simple. We do ALL of our engineering work (feature development, bug fixes, patches) within those Apache Software Foundation projects…no holdbacks. So we understand where each project is in its release cycle and maturity curve. We then integrate and test the latest stable releases of all of the projects that are ready for the enterprise into HDP. This means that if a component has not been declared stable within the community, then it’s implicitly not ready for the enterprise and we don’t ship it.
Moreover, this means that when we deliver the GA version of Apache Hadoop 2.2…it’s Hadoop 2.2 and not an early branch that’s been forked early and patched often. While Halloween is next week and Frankenstein monsters may be seen roaming the streets, you won’t see us shipping a product that resembles a Frankenpatch monster. That’s just not how we do our work. Managing lots of patches outside of the Apache process negates the benefits of the open source project development model, causes confusion and incompatibilities for the broader ecosystem, and can lead to lock-in.
Bottom-line: Staying in sync with what the Apache community builds and declares GA is critical for ensuring speedy and stable enterprise adoption. And HDP 2.0 brings the freshest innovations from the community to you, so try it out and let us know what you think!
Finally, we are not the only ones excited about the release, check out the list of ecosystem parters certified on HDP 2.0.
We have a lot of information to help you learn more about Hadoop 2, YARN, HBase, Stinger and all the exciting new developments in HDP 2.0, but here are a few quick links that you might find useful: