Get Started


Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
December 02, 2014
prev slideNext slide

Available Now: HDP 2.2

We are very pleased to announce that the Hortonworks Data Platform Version 2.2 (HDP) is now generally available for download. With thousands of enhancements across all elements of the platform spanning data access to security to governance, rolling upgrades and more, HDP 2.2 makes it even easier for our customers to incorporate HDP as a core component of Modern Data Architecture (MDA).

HDP 2.2 represents the very latest innovation from across the Hadoop ecosystem, where literally hundreds of developers have been collaborating with us to evolve each of the individual Apache Software Foundation (ASF) projects from the broader Apache Hadoop ecosystem. These projects have now been brought together into the complete and open Hortonworks Data Platform (HDP) delivering more than 100 new features and closing out thousands of issues across Apache Hadoop and its related projects.

These distinct ASF projects from across the Hadoop ecosystem span every aspect of the data platform and are easily categorized into:

  • Data management: this is the core of the platform, including Apache Hadoop and its subcomponents of HDFS and YARN, which is the architectural center of HDP.
  • Data access: this represents the broad range of options for developers to access and process data, stored in HDFS and depending on their application requirements.
  • The supporting enterprise services of governance, operations and security that are fundamental to any enterprise data platform.

A simple architectural rendering of those capabilities across the 5 elements of HDP is below:

Enterprise Blueprint for Hadoop

Seen another way, the chart below captures the evolution of HDP over the past 2 years and illustrates the synchronization of the core ASF projects into a single enterprise data platform. While others choose to fork the work done in the community into their own proprietary versions that quickly diverge from the trunk, with HDP you can be sure you are always leveraging the very latest innovation from the Apache community rather than the capacity of any single vendor.


HDP 2.2 Release Highlights

Every component in the HDP stack has been updated and we have added some key technologies and capabilities to HDP 2.2.

Enterprise SQL at Scale in Hadoop

While YARN has allowed new engines to emerge for Hadoop, the most popular integration point with Hadoop continues to be SQL and Apache Hive is still the defacto standard. This release delivers phase 1 of the Stinger.next initiative, a broad, open community based effort to improve speed, scale and SQL semantics.

  • Updated SQL Semantics for Hive Transactions for Update and Delete:.
    ACID transactions provide atomicity, consistency, isolation, and durability. This helps with streaming and baseline update scenarios for Hive such as modifying dimension tables or other fact tables.
  • Improved Performance of Hive with a Cost Based Optimizer: The cost based optimizer for Hive, uses statistics to generate several execution plans and then chooses the most efficient path as it relates system resources required to complete the operation. This presents a major performance increase for Hive.

Data Science within Hadoop with Spark on YARN

Apache Spark is an elegant, attractive development API allowing developers to rapidly iterate over data via machine learning and other data science techniques. In this release, we plan to deliver an integrated Spark on YARN experience with improved integration to Hive 0.14 support and support for ORCFile by year-end. These improvements allow Spark to easily share and deliver data within and around Spark.

Kafka for processing the Internet of Things

Included in HDP 2.2, Apache Kafka has quickly become the standard for high-scale, fault-tolerant, publish-subscribe messaging system for Hadoop. It is often used with Storm and Spark so that you can stream events in to Hadoop in real time and its application within the “internet of things” uses cases is tremendous.

Ensure uptime with Rolling Upgrades

In HDP 2.2, the rolling upgrade feature takes advantage of versioned packages, investments at the core of many of the projects and the underlying HDFS High Availability configuration to enable you to upgrade your cluster software and restart upgraded services, without taking the entire cluster down.

Extensive improvements to manage and monitor Hadoop>

Management and monitoring a cluster continues to be high priority for organizations adopting Hadoop. Our completely open approach via Apache Ambari is unique and we are excited to have Pivotal and HP jump on board to support Ambari with some of the other leaders in the data center like Microsoft and Teradata. In HDP 2.2, over a dozen new features to manage Hadoop have been added, but some of the biggest include:

  • Extend Ambari with Custom Views: Ambari Views Framework offers a systematic way to plug-in UI capabilities to surface custom visualization, management and monitoring features in the Ambari Web console. A “view” extends Ambari to allow 3rd parties to plug in new resource types along with the APIs, providers and UI to support them. In other words, a view is an application that is deployed into the Ambari container. Customers and partners now have an open source and open community framework to build new user interfaces and unique customer experiences with consistent management and security.
  • Ambari Blueprints deliver a template approach to cluster deployment: Ambari Blueprints provide a declarative definition of a cluster. With a Blueprint, you specify a Stack (in this case a version of HDP), the Component layout and the Configurations to materialize a Hadoop cluster instance (via a REST API) without having to use the Ambari Cluster Install Wizard. You can define any Stack to be deployed – which means you can extend the Stack definitions to include your own componentry and, of course, partners are working within the community to define their own Stacks based on this critical extensibility point.

Automated cloud backup for Microsoft Azure and Amazon S3

Data architects require Hadoop to act like other systems in the data center and business continuity through replication across on-premises and cloud-based storages targets is a critical requirement. In HDP 2.2, we extend the capabilities of Apache Falcon to establish an automated policy for cloud backup to Microsoft Azure or Amazon S3. This is the first step in a broader vision to enable extensive heterogeneous deployment models for Hadoop spanning Cloud-based and on-premises.

A more detailed list of features

Hortonworks is 100% committed to open source and the value provided by an active and open community of developers. HDP is the ONLY 100% open source Hadoop distribution and our code goes back into an open ASF governed project with a live and broad community. Over the course of the past few weeks and continuing into next, we have released a series of blog posts outlining in more detail some of the features that can be found within each of the various projects that comprise the HDP stack. We invite you to explore these highlight blogs as well as a complete list of new features and Jira tickets closed.


HDP 2.2 is now hortonwoks.com/hdp.


Leave a Reply

Your email address will not be published. Required fields are marked *