Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
July 29, 2015
prev slideNext slide

Introducing Availability of HDP 2.3 – Part 3

Last week, on July 22nd, we announced the general availability of HDP 2.3. Of the three part blog series, the first blog summarized the key innovations in the release—ease of use & enterprise readiness and how those are helping deliver transformational outcomes—while the second blog focused on data access innovation. In this final part, we explain cloud provisioning, proactive support, and other general improvements across the platform.

Automated Provisioning with Cloudbreak

Since Hortonworks’ acquisition of SequenceIQ, the integrated team has been working hard to complete the deployment automation for public clouds including Microsoft Azure, Amazon EC2, and Google Cloud. We are pleased to deliver Cloudbreak 1.0 along with HDP 2.3. Support and guidance are available to all Hortonworks customers who have an active Enterprise Plus support subscription, and we’ve published an initial set of installation and administrative documentation.

Cloudbreak is a cloud agnostic tool for provisioning, managing and monitoring on-demand Hadoop clusters. For administrators, it provides scripting functionality to automate tasks. Through its easy user interface, administrators can manage services for any configuration.

Cloudbreak can be used to provision Hadoop across major cloud providers: Microsoft Azure, Amazon Web Service, and Google Cloud Platform. It enables efficient usage of cloud platforms via policy-based autoscaling that can expand and contract the cluster based on Hadoop usage metrics and defined policies. And, it provides centralized and secure user experience to Hadoop cluster through rich web interface as well as REST API and CLI shell across all cloud providers. It is fundamentally integrated with Apache Ambari and heavily leverages the Ambari Blueprints functionality allowing users to reliably and repeatedly stand-up clusters based on their needs.


While Cloudbreak’s primary role is to launch on-demand Hadoop clusters in the cloud, the underlying technology actually does more. It can, for example, launch on-demand Hadoop clusters in any environment that supports Docker – in a dynamic way. Because all the setup, orchestration, networking, and cluster membership are done dynamically, there is no need for a predefined configuration.

While we are focused initially on the public cloud deployment options and flexibility, we are excited about future possibilities of leveraging Docker and Cloudbreak to deliver the maximum deployment choice for our customers within public clouds and within their data centers.

Proactive Support with Hortonworks SmartSense™

As we’ve seen the tremendous appetite for the adoption of Hadoop over the past 2 years, we have also observed more and more mission critical applications and workloads being placed on top of Hadoop. Not surprisingly, our rapidly growing base of customers look to Hortonworks for guidance and best practices to minimize their operational risk and maximize their resources and staff for Hadoop operations. To meet that demand, we have developed Hortonworks SmartSense. It enriches our already world-class support offering for Hadoop by:

  • Providing proactive insights and recommendations to customers about their cluster utilization and its health.
  • Quickly and easily capturing log files and metrics for faster support case resolution.
  • Delivering ongoing recommendations, suggestions and analytics to proactively prevent configuration problems.

Today, we are delivering a new user experience for SmartSense via the Ambari Views Framework in addition to completing the integration of the corresponding recommendations through our support portal. The SmartSense View plugs seamlessly into Ambari and allows for Hadoop operators to easily configure and manage how the information is gathered from the cluster.



SmartSense’s capabilities, says Cheolho Minale, vice president of technology at The Mobile Majority, will allow his Hadoop team to optimize its HDP cluster’s ad performance:

At The Mobile Majority, we have been using Hortonworks Data Platform to optimize ad performance on behalf of our customers. We’re excited to look into Hortonworks SmartSense as a way to continuously optimize our HDP cluster as it grows over time.

This is only the beginning for Hortonworks SmartSense. We believe that we can share valuable insights with our customers as we gain a deeper understanding of how our customers use HDP within their HDP environments, how their performance and usage peaks and ebbs, and how they optimize their HDP clusters using Smart Sense.

General Platform Improvements

Finally, I wanted to wrap up the HDP 2.3 blog series with a set of selective improvements to key components of HDP. Each of these improvements makes a difference in terms of ease of use, enterprise readiness, and simplification. Notable enhancements made in this release that we haven’t yet touched on elsewhere are described below:

Apache Hadoop 2.7.0 was released back in April, and with HDP 2.3 we are shipping Hadoop 2.7.1. The engineering work completed as part of Hadoop 2.7.1 ensures that it is stable and ready-to-use. Across its many components, here are some notable enhancements:


  • Non-exclusive Node Labels – where applications are given preference for the Label they specify, but not exclusive access (YARN-3214). This allows for greater resource sharing within a single cluster and is particularly useful for organizations where workload types shift at different times of day. The non-exclusive label allows for those nodes that might be typically dedicated for interactive workloads during the day can now be used to support nightly batch processing as well.
  • Fair sharing across apps for same user same queue, per queue scheduling policies (YARN-3306). This allows for the same user to submit multiple queries within the same queue and then fairly share the resources allocated to her across the jobs she has submitted.


  • Improve distcp efficiency: reduced time and processing power needed to mirror datasets across cluster (HDFS-7535, MAPREDUCE-6248)
  • Support for variable-length blocks (HDFS-3689)
  • Provide storage quotas per heterogeneous storage types (HDFS-7584)


Pig had a number of improvements as well, including the ability to call Hive UDFs directly from Pig.

  • Ability to call Hive UDFs from Pig (PIG-3294)
  • Dynamic Parallelism via Tez (PIG-4434)


Sqoop is used to move data from existing structured sources into and out of Hadoop. Two enhancements related to mainframe datasets and Netezza have been delivered in Sqoop 1.4.6:

  • Import sequential datasets from mainframe (SQOOP-1272) – eases movement of data between mainframes and HDFS. Thank you to Mariappan Asokan of SyncSort for the contribution!
  • Netezza enhancements: skip control codes, write logs to HDFS (SQOOP-2164)


Oozie is the defacto job scheduler for Hadoop. In Oozie 4.2.0, 2 additional actions have been added, increasing the ability for users to define workflows that include HiveServer2 and Spark. In addition, a key enhancement for stopping (both kill and suspend) jobs by their coordinator name has been added:

What’s Next?

This year Hortonworks has focused on three key themes: ease of use, enterprise readiness, and simplification. We want to make HDP easy to use for all types of users. This means continuing to deliver breakthrough user experiences for cluster administrators, developers, and data workers, from data scientists to architects. We want to increase the adoption of HDP within the enterprise, and this means improving ease of operations, increasing security, and providing comprehensive data governance.

Lastly, as we bring all together the various Apache projects that make up HDP, we want to ensure that they work together in a seamless, integrated, and simple to use data processing platform.

We are excited about the progress we’ve made with the arrival of HDP 2.3 and we hope you enjoy the results of all of the open source developers within the community who made this possible.

Learn More


Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums