A tool for provisioning and managing Apache Hadoop clusters in the cloud

Cloudbreak automates the launching of elastic Hadoop clusters with policy-based autoscaling on the major cloud infrastructure platforms including Microsoft Azure, Amazon Web Services, Google Cloud Platform, OpenStack, as well as platforms that support Docker containers for greater application mobility.

What Cloudbreak Does

Enterprises are embracing Hadoop to enable their modern data architecture and power new analytic applications. The freedom to choose the on-premises or cloud infrastructure that best fulfills their needs is a critical requirement. Open Enterprise Hadoop gives administrators and developers the flexibility to choose the right deployment option and the tools to make it intuitive.  Hadoop has been deployed across Windows and Linux x86 servers, integrated hardware appliances, cloud infrastructure-as-a-service platforms, and managed cloud services.

Cloudbreak is a cloud agnostic tool for provisioning, managing and monitoring of on-demand clusters. You can use its scripting functionality to automate tasks and its easy UI to manage services for any configuration.   Cloudbreak can be used to provision Hadoop across the following major cloud providers: Microsoft Azure, Amazon Web Service, Google Cloud Platform and OpenStack. It enables more efficient usage of cloud platforms via policy-based autoscaling that can expand and contract the cluster based on Hadoop usage metrics and defined policies. And it provides centralized and secure user experience to Hadoop cluster through rich web interface as well as REST API and CLI shell across all cloud providers.

How Cloudbreak Works

Cloudbreak is built on the foundation of cloud providers APIs (Microsoft Azure, Amazon AWS, Google Cloud Platform, OpenStack), Apache Ambari, Docker containers, Swarm and Consul. It launches on-demand Hadoop clusters on cloud in 3 steps :


  1. Create Template and provide credential : A template is an easy way to create and manage a collection of cloud infrastructure related resources, maintaining and updating them in an orderly and predictable fashion. Cloudbreak supports heterogeneous Hadoop clusters by combining different templates. The credential will contain user’s cloud provider specific access information.
  2. Provide Ambari Blueprints : Ambari Blueprints are a declarative definition of a Hadoop cluster. Blueprints can be either for specialized applications or specific use cases.
  3. Launch Cluster: In this step, hadoop clusters are launched based on the templates and credentials provided. Once a Hadoop cluster is created and launched, its components can be accessed using the credentials.

Internally, Cloudbreak uses Docker container technology to deploy Hadoop clusters in a cloud-agnostic way and then uses Apache Ambari to have declarative Hadoop cluster with app or use case-specific blueprints, as shown below


Cloudbreak also optimizes cloud infrastructure usage by providing policy based auto-scaling functionality. These policies can be static time based or can be based on cluster metrics captured by Ambari.


Hortonworks Collaboration for Cloudbreak

Hortonworks is focused on going to market with a 100% open source solution. This focus allows us to collectively provide the product management guidance for Enterprise Grade Hadoop to mainstream enterprises, our partner ecosystem, and further innovate the core of Hadoop.

  • OpenDeliver a complete set of features for Hadoop cloud deployment, in the public and with the community, by defining the operational framework and lifecycle. 
  • FlexibleSupport a wider array of cloud providers with a common set of API’s to deploy hadoop.
  • IntegratedEnsure that Hadoop cloud deployment can be integrated with existing IT tools, behind a single pane of glass, by providing REST APIs and multiple views of the cluster.

Recent Improvement to Hadoop Cloud Provisioning and Management

The following features are available in HDP 2.3:

    Integrate Ambari alerting and Ambari Metrics System with auto-scaling functionality

  • Ambari 2.0 introduced a new alerting and monitoring system that can monitor HDP KPIs for the entire Hadoop ecosystem. It’s a powerful approach that consists of both individual Hadoop project metric alerts and advanced altering.   Advanced altering provides visibility into aggregated, service level, host level and script based metrics. Cloudbreak’s auto-scaling feature enables Hadoop clusters to scale up and down based on SLA policies and demand.   This functionality is now integrated with Ambari’s new alerting and monitoring system.
  • Customize Hadoop clusters provisioning using Cloudbreak Recipes and Plugins
    Cloudbreak introduced a new concept called Cloudbreak Recipes that enables Hadoop operators to deploy custom scripts at any stage of deployment via Ambari. This provides additional flexibility to customize cloud HDP provisioning. For example, Cloudbreak Recipes can be used to make changes on the host nodes, such as putting an additional JAR file on the Hadoop classpath or running custom scripts.
  • Hadoop Cluster Security and Network customization
    Cloudbreak provides system administrators with the ability to customize HDP network and security cloud provisioning settings using new functionality called Network Resources and Security Group.   Now cloud network and security settings can be configured to default or custom levels using these features.
  • Technical Preview of Openstack Hadoop cluster provisioning for private cloud
    Cloudbreak now has a Technical Preview for provisioning HDP on OpenStack.  Cloudbreak is generally available to provision HDP on Amazon Web Services (AWS), Google Cloud Platform (GCP) and Microsoft Azure public clouds.

Download Cloudbreak 1.0

For additional details about this release review the following resources:

Learn more about Ambari Views, Ambari Blueprints and Ambari Stacks

Hortonworks Focus for Cloudbreak

Hortonworks has been supporting application developers and operators to deploy Hadoop anywhere.  Hortonworks is already busy working with these innovative technologies and defining the path forward to enrich the deployment automation and auto-scaling capabilities of HDP for all of our customers and partners. We have already received requests to add support for additional cloud-providers, and the integration between Periscope and the new Ambari 2.0 alerts subsystem is complete.

Given our strong open source heritage, we believe Hortonworks is uniquely qualified to ensure that the Cloudbreak technologies continue to flourish in the open. While the source code is already available under the Apache Software License v2, from SequenceIQ acquisition, we plan to contribute the code to the Apache Software Foundation sometime in 2015 as a new project or as part of an existing project.

This move is in line with our belief that the fastest path to innovation is through developing in open source within an open community. Since our strategy is squarely focused on a 100% open-source model with no proprietary extensions, we are never conflicted about which capabilities, features, or components to incorporate within the Hortonworks Data Platform (HDP). We listen to our customers’ and partners’ requirements and work together with them in the open to deliver the best the community has to offer.

If you have questions or feedback on Cloudbreak please post them to the Cloudbreak Forum.

Cloudbreak in our Blog

Webinars & Presentations


Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.