During 2015, the pace of adoption of the Hortonworks Data Platform (HDP) continued to accelerate. Both existing and new customers brought massive amounts of data under management using Apache Hadoop technologies. I found it really interesting that new requirements being discussed by our customers showed much less consistency than previous years. In particular, we heard what sounded like two very different and perhaps even conflicting requirements.
First, many of our customers are deploying more and more mission critical workloads onto Hadoop. This originally started with the introduction of YARN in October 2013. Of course, there are still a large number of traditional, batch-style workloads (more typically associated with the Hadoop 1 era) which continue to run on HDP, such as an active archive that offloads data from traditional enterprise data warehouses. However, the number of interactive and real-time use cases has increased dramatically in the past year. Due to this shift, customers must now apply more rigor and precision when upgrading HDP itself. This means that customers would like to reduce the frequency with which they need to perform these upgrade cycles. An annual refresh is about as fast as most enterprises wish to move.
Customers are asking Hortonworks to help them balance the pace at which they must adopt the innovations coming from the core Hadoop project (across HDFS and YARN in particular which form the foundation for HDP). This ensures that customers have a stable platform for their mission critical workloads that continues to safeguard all the data being stored within their data lake.
Second, and seemingly contradicting this first requirement, we have customers who are absolutely demanding access to the latest innovations being developed within the open source community. This requirement is related to the services that access the data stored in HDP. HDP now consists of more than 20 projects from across the Hadoop ecosystem including Hive, HBase, Storm, Spark and more. While each of these projects continues to move forward, they are not all progressing at the same rate. During 2015, for example, the Spark community set a torrid pace by driving significant, new features and enterprise readiness improvements every few months. There were times when customers could not extract the maximum value from the latest Spark features because we had not completed our testing and certification activities associated with these new releases. In some cases, this was actually advantageous to our customers as issues were uncovered and addressed by the time we delivered updates for Spark. In fact, Hortonworks ended up delivering 4 different versions of Spark to customers across various releases during 2015.
To summarize, at the end of 2015, customers were asking us to:
Did you notice I wrote “seemingly contradicting”? One of the things that product managers do is to carefully parse these types of requirements, particularly in situations where they seem to conflict. We ask questions and listen carefully to the responses. In this case, what we are hearing from our customers is remarkably similar to what we were hearing from our partner ecosystem as well. Prior to the creation of ODPi (http://www.ODPI.org/), ISV partners wanted to integrate with HDP and other Hadoop distributions. However, the variability of project versions across distributions and the pace of change occurring within the those projects was way too much for our partners to deal with. One of the primary goals of establishing ODPi, was to create an industry standard and open data management core. Initially focused on Apache Hadoop®, ODPi will develop and promote a set of open, enterprise focused Hadoop® standards and technologies. This translates into higher stability, richer capabilities, and stronger compatibility among the Hadoop® distributions. Does that last part about increased stability sound familiar? We thought so too. By following the guidance already established via ODPi, Hortonworks believes we can meet the stability and consistency while reducing overall platform churn being requested by both our partners and customers.
Unveiling our new HDP release strategy
Starting with HDP 2.4, Hortonworks plans to release platform updates in two different cadences:
Hortonworks has aligned the core of HDP around the common core established via ODPi. HDP 2.4 is based on Hadoop 2.7.1. This is the same core delivered as part of HDP 2.3.4 in December 2015. As we continue to develop releases for the HDP 2 line, the core will remain Hadoop 2.7.1 and will only include maintenance updates for this critical component. This approach provides customers who have adopted HDP 2.3 (or beyond) with the stability they are requesting for the remainder of the updates associated with the HDP 2.x line. We will shift to HDP 3 when we adopt the next stable Hadoop version, align with the ODPi common core, and then maintain that version until the next change, when we will then shift to HDP 4 and so on.
Feature bearing releases that allow us to match the pace of innovation of the various Apache Software Foundation components can then be delivered on top of that common core. As new feature bearing releases focused on extended services that run on top of the common core are subsequently delivered, we will change the second digit of the HDP version. In that context, today, we are announcing general availability of HDP 2.4 along with Ambari 2.2.1.
What’s new in HDP 2.4? Spark 1.6 GA
HDP 2.4 delivers support for Spark 1.6. Spark is one of the extended services which runs on top of Hadoop. As those services evolve, customers who require the latest innovations should be able to easily adopt them in a non-disruptive manner. Hortonworks provided a technical preview of Spark 1.6 within hours of the community approving the release of Spark 1.6. However, for many of our customers, simply having access to a technical preview isn’t sufficient for them to deploy it into their production environments. Over the past few weeks, we rapidly worked through our testing and certification process and are meeting the requirements for customers ready to adopt the latest innovations related to Spark from the open source community.
For those customers whose requirements are met by Spark 1.5.2 or earlier, they can remain on HDP 2.3.x with the understanding that the same core exists between HDP 2.4 and HDP 2.3.4. There is no need to upgrade and if there happens to be any additional issues found on Hadoop 2.7.1, maintenance releases will be created to address these defects for both the HDP 2.3.x release and the HDP 2.4.x release – in an attempt to keep the core aligned and maintain stability.
What else is new? Ambari 2.2 with Express Upgrade
The latest release of Ambari now delivers a new upgrade mechanism called “Express Upgrade.” In early 2015, Ambari introduced automation for a rolling upgrade capability to take advantage of innovations delivered throughout the 20 plus components included with HDP 2.2. During 2015, we have continued to refine and improve the rolling upgrade capability which allows customers to apply both maintenance releases and feature bearing releases of HDP onto their running cluster; eliminating downtime for their mission critical applications. Yet, as we continued to work with customers throughout the year, it became clear that additional styles of upgrade techniques were also desired. Express Upgrade allows for the automation of both maintenance and feature bearing releases in a rapid manner — while the cluster is down. For organizations with extremely large clusters or those environments where the pre-requisites for the rolling upgrades are not in place, Express Upgrade now provides the best and quickest path to getting the latest HDP bits in place. There are a number of additional capabilities provided with Ambari 2.2, but Express Upgrade is a clear highlight.
Where do we go from here? We are empowering customers to match the pace of adoption for the latest innovations based on their requirements while also providing maximum stability. We are cleanly and clearly declaring a common core around HDP major versions (HDP 2, HDP 3, HDP 4, and so on) and availability of the latest service innovations via minor versions on top of that common core (HDP 2.4, HDP 2.5, HDP 3.1, HDP 3.2, etc.). Subsequent digits within the released versions of HDP will only deliver defect fixes. Coordinated maintenance releases will be made available to ensure that the common core is maintained. For example, HDP 2.3.4 and HDP 2.4.0 are based on the same common core and when defects need to be addressed on that common core future releases of HDP (such as 2.3.5 and 2.4.1) will be coordinated to maintain synchronicity.
At Hortonworks, we believe in driving innovation. This is at the core of all of the engineering activities we engage in and we know that we are the stewards of open enterprise Apache Hadoop. We are listening to our customers and partners as HDP continues to evolve. We hope that these changes to our release strategy helps deliver stability for the core of HDP and the uptake of the latest innovations for the extended services, thereby allowing customers to maximize the value of their data.