Hortonworks Data Platform

Completely open source Apache Hadoop data platform, architected for the enterprise

HDP 2.2 - A major step forward for Enterprise Hadoop.

Hortonworks Data Platform Version 2.2 represents yet another major step forward for Hadoop as the foundation of a Modern Data Architecture. This release incorporates the most recent innovations that have happened in Hadoop and its supporting ecosystem of projects. HDP 2.2 packages more than a hundred new features across all our existing projects. Every component is updated and we have added some key technologies and capabilities to HDP 2.2

HDP 2.2 Asparagus Diagram

Key highlights of HDP 2.2 include:

Our Development Approach

Our approach at Hortonworks is to enable a Modern Data Architecture with YARN as the architectural center, supported by key capabilities required of an enterprise data platform -- spanning Governance, Security and Operations.

To this end, we work within the governance model of the Apache Software Foundation contributing to and progressing the individual components from the Hadoop ecosystem and ultimately integrating them into the Hortonworks Data Platform (HDP).

Enterprise SQL at Scale in Hadoop

While YARN has allowed new engines to emerge for Hadoop, the most popular integration point with Hadoop continues to be SQL and Apache Hive is still the defacto standard.

New capabilities in HDP 2.2 include:

  • Updated SQL Semantics for Hive Transactions for Update and Delete ACID transactions provide atomicity, consistency, isolation, and durability. This helps with streaming and baseline update scenarios for Hive such as modifying dimension tables or other fact tables.
  • Improved Performance of Hive with a Cost Based Optimizer The cost based optimizer for Hive, uses statistics to generate several execution plans and then chooses the most efficient path as it relates system resources required to complete the operation. This presents a major performance increase for Hive.

A Complete Update of the Stack and NEW engines

Building on YARN as its architectural center, Hadoop continues to attract new engines. As organizations strive to efficiently store their data in a single repository and interact with it simultaneously in different ways, they need SQL, streaming, data science, batch and more… all in the same cluster. HDP 2.2 adds new engines including:

A SQL skin over HBase Called Apache Phoenix

In HDP 2.2, we include Apache Phoenix, which is a SQL skin over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Some of our recent improvements to this project include substantially improve secondary indexing to improve query performance.

Kafka for Processing the Internet of Things

Apache Kafka has quickly become the standard for high-scale, fault-tolerant, publish-subscribe messaging system for Hadoop. It is often used with Storm and Spark so that you can stream events in to Hadoop in real time and its application within the “internet of things” uses cases is tremendous.

Data Science within Hadoop, with Spark on YARN

Apache Spark has emerged as an elegant, attractive development API allowing developers to rapidly iterate over data via machine learning and other data science techniques. In HDP 2.2 we deliver an integrated Spark on YARN with improved integration to Hive 0.13 support and support for ORCFile. These improvements allow Spark to easily share and deliver data within and around Spark.

Introducing Apache Ranger (incubating) for comprehensive cluster security policy

With increased adoption of Hadoop, a heightened requirement for a centralized approach to security policy definition and coordinated enforcement has surfaced. As part of HDP 2.2, Apache Ranger delivers a comprehensive approach to central security policy administration addressing authorization and auditing. Some of the work we have delivered extends Ranger to integrate with Storm and Knox while deepening existing policy enforcement capabilities with Hive and HBase.

Extensive improvements to manage and monitor Hadoop

Management and monitoring a cluster continues to be high priority for organizations adopting Hadoop. In HDP 2.2, over a dozen new features to aid enterprises to manage Hadoop have been added, but some of the biggest include:

Extend Ambari with Custom Views

Ambari Views Framework offers a systematic way to plug-in UI capabilities to surface custom visualization, management and monitoring features in the Ambari Web console. A "view" extends Ambari to allow 3rd parties to plug in new resource types along with the APIs, providers and UI to support them. In other words, a view is an application that is deployed into the Ambari container.

Ambari Blueprints deliver a template approach to cluster deployment

Ambari Blueprints are a declarative definition of a cluster. With a Blueprint, you specify a Stack, the Component layout and the Configurations to materialize a Hadoop cluster instance (via a REST API) without having to use the Ambari Cluster Install Wizard. You can define any stack to be deployed.

Business Continuity & Rolling Upgrades

Ensure uptime with Rolling Upgrades

In HDP 2.2 the rolling upgrade feature takes advantage of versioned packages, investments at the core of many of the projects and the underlying HDFS High Availability configuration to enable you to upgrade your cluster software and restart upgraded services, without taking the entire cluster down.

Automated cloud backup for Microsoft Azure and Amazon S3

Data architects require Hadoop to act like other systems in the data center and business continuity through replication across on-premises and cloud-based storages targets is a critical requirement. In HDP 2.2 we extend the capabilities of Apache Falcon to establish an automated policy for cloud backup to Microsoft Azure or Amazon S3. This is the first step in a broader vision to enable extensive heterogeneous deployment models for Hadoop.

HDP 2.1 Webinar Series
Join us for a series of talks on some of the new enterprise functionality available in HDP 2.1 including data governance, security, operations and data access :
Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.