Get Started


Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
October 15, 2014
prev slideNext slide

HDP 2.2 – A Major Step Forward for Enterprise Hadoop

Hortonworks Data Platform Version 2.2 represents yet another major step forward for Hadoop as the foundation of a Modern Data Architecture. This release incorporates the last six months of innovation and includes more than a hundred new features and closes thousands of issues across Apache Hadoop and its related projects.


Our approach at Hortonworks is to enable a Modern Data Architecture with YARN as the architectural center, supported by key capabilities required of an enterprise data platform — spanning Governance, Security and Operations. To this end, we work within the governance model of the Apache Software Foundation contributing to and progressing the individual components from the Hadoop ecosystem and ultimately integrating them into the Hortonworks Data Platform (HDP).


Our investment across all these technologies follows the same pattern.

  • VERTICAL: We integrate the projects within our Hadoop distribution with YARN and HDFS in order to enable HDP to span workloads from batch, interactive, and real time and across both open source and other data access technologies. Some work we deliver in this release to deeply integrate Apache Storm and Apache Spark within Hadoop are representative of this approach.
  • HORIZONTAL: We also ensure the key enterprise requirements of governance, security, and operations can be applied consistently and reliably across all the components within the platform. This allows HDP to meet the same requirements of any other technology in the data center. In HDP 2.2, our work within the Apache Ambari community helped extend integrated operations and we contributed Apache Ranger (Argus) to drive consistent security across Hadoop.
  • AT DEPTH: We deeply integrate HDP with the existing technologies within the data center to augment and enhance existing technologies and capabilities so you can reuse existing skills and resources.

A Comprehensive Data Platform

With YARN as its architectural center, Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it simultaneously in different ways. They want SQL, streaming, machine learning, along with traditional batch and more… all in the same cluster. To this end, HDP 2.2 packages many new features. Every component is updated and we have added some key technologies and capabilities to HDP 2.2


HDP 2.2 Release Highlights

NEW: Enterprise SQL at Scale in Hadoop

While YARN has allowed new engines to emerge for Hadoop, the most popular integration point with Hadoop continues to be SQL and Apache Hive is still the defacto standard. While many SQL engines for Hadoop have emerged, their differentiation is being rendered obsolete as the open source community surrounds and advances this key engine at an accelerated rate. This release delivers phase 1 of the Stinger.next initiative, a broad, open community based effort to improve speed, scale and SQL semantics.

  • Updated SQL Semantics for Hive Transactions for Update and Delete
    ACID transactions provide atomicity, consistency, isolation, and durability. This helps with streaming and baseline update scenarios for Hive such as modifying dimension tables or other fact tables.
  • Improved Performance of Hive with a Cost Based Optimizer
    The cost based optimizer for Hive, uses statistics to generate several execution plans and then chooses the most efficient path as it relates system resources required to complete the operation. This presents a major performance increase for Hive.

NEW: Data Science within Hadoop with Spark on YARN

Apache Spark has emerged as an elegant, attractive development API allowing developers to rapidly iterate over data via machine learning and other data science techniques. While we have supported Spark as a tech preview for the past few months, in this release we plan to deliver an integrated Spark on YARN with improved integration to Hive 0.13 support and support for ORCFile by year-end. These improvements allow Spark to easily share and deliver data within and around Spark.

NEW: Kafka for processing the Internet of Things

Apache Kafka has quickly become the standard for high-scale, fault-tolerant, publish-subscribe messaging system for Hadoop. It is often used with Storm and Spark so that you can stream events in to Hadoop in real time and its application within the “internet of things” uses cases is tremendous.

New: Apache Ranger (Argus) for comprehensive cluster security policy

With increased adoption of Hadoop, a heightened requirement for a centralized approach to security policy definition and coordinated enforcement has surfaced. As part of HDP 2.2, Apache Ranger (formerly known as Argus) delivers a comprehensive approach to central security policy administration addressing authorization and auditing. Some of the work we have delivered extends Ranger to integrate with Storm and Knox while deepening existing policy enforcement capabilities with Hive and HBase.

New: Extensive improvements to manage & monitor Hadoop

Management and monitoring a cluster continues to be high priority for organizations adopting Hadoop. Our completely open approach via Apache Ambari is unique and we are excited to have Pivotal and HP jump on board to support Ambari with some of the other leaders in the data center like Microsoft and Teradata. In HDP 2.2, over a dozen new features to aid enterprises to manage Hadoop have been added, but some of the biggest include:

  • Extend Ambari with Custom Views
    Ambari Views Framework offers a systematic way to plug-in UI capabilities to surface custom visualization, management and monitoring features in the Ambari Web console. A “view” extends Ambari to allow 3rd parties to plug in new resource types along with the APIs, providers and UI to support them. In other words, a view is an application that is deployed into the Ambari container.
  • Ambari Blueprints deliver a template approach to cluster deployment
    Ambari Blueprints are a declarative definition of a cluster. With a Blueprint, you specify a Stack, the Component layout and the Configurations to materialize a Hadoop cluster instance (via a REST API) without having to use the Ambari Cluster Install Wizard. You can define any stack to be deployed.

NEW: Ensure uptime with Rolling Upgrades

In HDP 2.2 the rolling upgrade feature takes advantage of versioned packages, investments at the core of many of the projects and the underlying HDFS High Availability configuration to enable you to upgrade your cluster software and restart upgraded services, without taking the entire cluster down.

NEW: Automated cloud backup for Microsoft Azure and Amazon S3

Data architects require Hadoop to act like other systems in the data center and business continuity through replication across on-premises and cloud-based storages targets is a critical requirement. In HDP 2.2 we extend the capabilities of Apache Falcon to establish an automated policy for cloud backup to Microsoft Azure or Amazon S3. This is the first step in a broader vision to enable extensive heterogeneous deployment models for Hadoop.

Value in a Completely Open Approach

Hortonworks is 100% committed to open source and the value provided by an active and open community of developers. HDP is the ONLY 100% open source Hadoop distribution and our code goes back into an open ASF governed project with a live and broad community.

Hortonworks leadership is not just in numbers of committers but it is depth and diversity of involvement across the numerous open source projects that comprise our distribution. We are architects and builders and many of our developers are involved across multiple projects either directly as a committer or in partnering with developers across cube walls and across the Apache community. Our investment in Enterprise Hadoop starts with YARN, which allows us to integrate applications vertically within the stack, tying them to the data operating system, but this also allows us to apply consistent capabilities for key enterprise requirements of governance, security and operations.


A tech preview of HDP 2.2 is available today at hortonwoks.com/hdp

Complete List of HDP 2.2 New Features

Apache Hadoop YARN

  • Slide existing services onto YARN through ‘Slider’
  • GA release of HBase, Accumulo, and Storm on YARN
  • Support long running services: handling of logs, containers not killed when AM dies, secure token renewal, YARN Labels for tagging nodes for specific workloads
  • Support for CPU Scheduling and CPU Resource Isolation through CGroups

Apache Hadoop HDFS

  • Heterogeneous storage: Support for archival tier
  • Rolling Upgrade (This is an item that applies to the entire HDP Stack. YARN, Hive, HBase, everything. We now support comprehensive Rolling Upgrade across the HDP Stack).
  • Multi-NIC Support
  • Heterogeneous storage: Support memory as a storage tier (Tech Preview)
  • HDFS Transparent Data Encryption (Tech Preview)

Apache Hive, Apache Pig, and Apache Tez

  • Hive Cost Based Optimizer: Function Pushdown & Join re-ordering support for other join types: star & bushy.
  • Hive SQL Enhancements including:
    • ACID Support: Insert, Update, Delete
    • Temporary Tables
  • Metadata-only queries return instantly
  • Pig on Tez
  • Including DataFu for use with Pig
  • Vectorized shuffle
  • Tez Debug Tooling & UI

Apache HBase, Apache Phoenix, & Apache Accumulo

  • HBase & Accumulo on YARN via Slider
  • HBase HA
    • Replicas update in real-time
    • Fully supports region split/merge
    • Scan API now supports standby RegionServers
  • HBase Block cache compression
  • HBase optimizations for low latency
  • Phoenix Robust Secondary Indexes
  • Performance enhancements for bulk import into Phoenix
  • Hive over HBase Snapshots
  • Hive Connector to Accumulo
  • HBase & Accumulo wire-level encryption
  • Accumulo multi-datacenter replication

Apache Storm

  • Storm-on-YARN via Slider
  • Ingest & notification for JMS (IBM MQ not supported)
  • Kafka bolt for Storm – supports sophisticated chaining of topologies through Kafka
  • Kerberos support
  • Hive update support – Streaming Ingest
  • Connector improvements for HBase and HDFS
  • Deliver Kafka as a companion component
  • Kafka install, start/stop via Ambari
  • Security Authorization Integration with Ranger

Apache Spark

  • Refreshed Tech Preview to Spark 1.1.0 (available now)
  • ORC File support & Hive 0.13 integration
  • Planned for GA of Spark 1.2.0
  • Operations integration via YARN ATS and Ambari
  • Security: Authentication

Apache Solr

  • Added Banana, a rich and flexible UI for visualizing time series data indexed in Solr


  • Cascading 3.0 on Tez distributed with HDP — coming soon


  • Support for HiveServer 2
  • Support for Resource Manager HA

Apache Falcon

  • Authentication Integration
  • Lineage – now GA. (it’s been a tech preview feature…)
  • Improve UI for pipeline management & editing: list, detail, and create new (from existing elements)
  • Replicate to Cloud – Azure & S3

Apache Sqoop, Apache Flume & Apache Oozie

  • Sqoop import support for Hive types via HCatalog
  • Secure Windows cluster support: Sqoop, Flume, Oozie
  • Flume streaming support: sink to HCat on secure cluster
  • Oozie HA now supports secure clusters
  • Oozie Rolling Upgrade
  • Operational improvements for Oozie to better support Falcon
  • Capture workflow job logs in HDFS
  • Don’t start new workflows for re-run
  • Allow job property updates on running jobs

Apache Knox & Apache Ranger (Argus) & HDP Security

  • Apache Ranger – Support authorization and auditing for Storm and Knox
  • Introducing REST APIs for managing policies in Apache Ranger
  • Apache Ranger –  Support native grant/revoke permissions in Hive and HBase
  • Apache Ranger –  Support Oracle DB and  storing of audit logs in HDFS
  • Apache Ranger to run on Windows environment
  • Apache Knox to protect YARN RM
  • Apache Knox support for HDFS HA
  • Apache Ambari install, start/stop of  Knox

Apache Slider

  • Allow on-demand create and run different versions of heterogeneous applications
  • Allow users to configure different application instances differently
  • Manage operational lifecycle of application instances
  • Expand / shrink application instances
  • Provide application registry for publish and discovery

Apache Ambari

  • Support for HDP 2.2 Stack, including support for Kafka, Knox and Slider
  • Enhancements to Ambari Web configuration management including: versioning, history and revert, setting final properties and downloading client configurations
  • Launch and monitor HDFS rebalance
  • Perform Capacity Scheduler queue refresh
  • Configure High Availability for ResourceManager
  • Ambari Administration framework for managing user and group access to Ambari
  • Ambari Views development framework for customizing the Ambari Web user experience
  • Ambari Stacks for extending Ambari to bring custom Services under Ambari management
  • Ambari Blueprints for automating cluster deployments
  • Performance improvements and enterprise usability guardrails


  • Wow that is an impressive list of very serious improvements.

    I thought it would take longer for some of those things to materialize…

    Knowing what you have scheduled next and why, after that I must insist on Apache Drill 🙂

  • This is a great product, but the Linux version is so difficult to install for people who have used to the windows enviroment. On another way around, the windows version does not have the HUE UI….

  • It is possible to know when HDP 2.2 ready for the enterprise will be officially available?

  • You suggest some degree of integration with Hue, but it does not appear in my (finally working – Yaaay!) Ambari 1.7.0/HDP2.2 installation and i have been unable to locate specific installation/integration instructions; are they there and i just stumbled past them?

  • the repository locations as mentioned in the installation documentation are not working.
    can anybody provide the correct urls for ubuntu14.04 running on arm?

  • The repositories mentioned above do not work. Can you please provide an alternative location.

  • Hi,
    I am using HDP-2.2.4 sandbox, i would like to work on hive& hbase as well where i can access hive but not HBASE. I installed zookeeper in my sandbox as i thought zookeeper is missing here and installed in sandbox and also restarted the same.

    Please let me know how to work with Hbase in Sandbox-2.2.4

  • Leave a Reply

    Your email address will not be published. Required fields are marked *