Hortonworks Data Platform
HDP 2.2 - A major step forward for Enterprise Hadoop.
Hortonworks Data Platform Version 2.2 represents yet another major step forward for Hadoop as the foundation of a Modern Data Architecture. This release incorporates the most recent innovations that have happened in Hadoop and its supporting ecosystem of projects. HDP 2.2 packages more than a hundred new features across all our existing projects. Every component is updated and we have added some key technologies and capabilities to HDP 2.2
Key highlights of HDP 2.2 include:
Our Development Approach
Our approach at Hortonworks is to enable a Modern Data Architecture with YARN as the architectural center, supported by key capabilities required of an enterprise data platform -- spanning Governance, Security and Operations.
To this end, we work within the governance model of the Apache Software Foundation contributing to and progressing the individual components from the Hadoop ecosystem and ultimately integrating them into the Hortonworks Data Platform (HDP).
Enterprise SQL at Scale in Hadoop
While YARN has allowed new engines to emerge for Hadoop, the most popular integration point with Hadoop continues to be SQL and Apache Hive is still the defacto standard.
New capabilities in HDP 2.2 include:
- Updated SQL Semantics for Hive Transactions for Update and Delete ACID transactions provide atomicity, consistency, isolation, and durability. This helps with streaming and baseline update scenarios for Hive such as modifying dimension tables or other fact tables.
- Improved Performance of Hive with a Cost Based Optimizer The cost based optimizer for Hive, uses statistics to generate several execution plans and then chooses the most efficient path as it relates system resources required to complete the operation. This presents a major performance increase for Hive.
A Complete Update of the Stack and NEW engines
Building on YARN as its architectural center, Hadoop continues to attract new engines. As organizations strive to efficiently store their data in a single repository and interact with it simultaneously in different ways, they need SQL, streaming, data science, batch and more… all in the same cluster. HDP 2.2 adds new engines including:
Data Science within Hadoop, with Spark on YARN
Apache Spark has emerged as an elegant, attractive development API allowing developers to rapidly iterate over data via machine learning and other data science techniques. In HDP 2.2 we deliver an integrated Spark on YARN with improved integration to Hive 0.13 support and support for ORCFile. These improvements allow Spark to easily share and deliver data within and around Spark.
Introducing Apache Ranger (incubating) for comprehensive cluster security policy
With increased adoption of Hadoop, a heightened requirement for a centralized approach to security policy definition and coordinated enforcement has surfaced. As part of HDP 2.2, Apache Ranger delivers a comprehensive approach to central security policy administration addressing authorization and auditing. Some of the work we have delivered extends Ranger to integrate with Storm and Knox while deepening existing policy enforcement capabilities with Hive and HBase.
Extensive improvements to manage and monitor Hadoop
Management and monitoring a cluster continues to be high priority for organizations adopting Hadoop. In HDP 2.2, over a dozen new features to aid enterprises to manage Hadoop have been added, but some of the biggest include:
Ambari Blueprints deliver a template approach to cluster deployment
Ambari Blueprints are a declarative definition of a cluster. With a Blueprint, you specify a Stack, the Component layout and the Configurations to materialize a Hadoop cluster instance (via a REST API) without having to use the Ambari Cluster Install Wizard. You can define any stack to be deployed.
Business Continuity & Rolling Upgrades
Automated cloud backup for Microsoft Azure and Amazon S3
Data architects require Hadoop to act like other systems in the data center and business continuity through replication across on-premises and cloud-based storages targets is a critical requirement. In HDP 2.2 we extend the capabilities of Apache Falcon to establish an automated policy for cloud backup to Microsoft Azure or Amazon S3. This is the first step in a broader vision to enable extensive heterogeneous deployment models for Hadoop.