Ambari Blueprints Provision Clusters with Greater Speed and Ease

An immense step forward for dev-ops to provision a cluster quickly

Apache Ambari has always provided an operator the ability to provision an Apache Hadoop cluster using an intuitive Cluster Install Wizard web interface, guiding the user through a series of steps:

  • confirming the list of hosts
  • assigning master, slave, and client components to configuring services, and
  • installing, starting and testing the cluster.

With Ambari Blueprints, system administrators and dev-ops engineers can expedite the process of provisioning a cluster. Once defined, Blueprints can be re-used, which facilitates easy configuration and automation for each successive cluster creation.

Best Practices, From Experience

Hortonworks has worked with countless enterprise customers and partners deploying Hadoop within their data centers. From this experience we distilled many best practices for how to distribute the Hadoop components and stand up clusters.

The number of Hadoop deployments continues to grow within the enterprise. These clusters come in many types: production, development, test and discovery. Some clusters are permanent and some are ad hoc, launched for a short period of time for a specific task.

This variability means that installing a cluster through a web interface can be a time-consuming task for an administrator to make sure each cluster is configured correctly for the intended use case. Combine that with the Apache Hadoop ecosystem’s rapid innovation and expansion, and there is greater and greater demand for a higher level of consistent and automated cluster provisioning.

Ambari Blueprints Make Cluster Provisioning Easier & Repeatable

The Ambari Blueprints feature in recently released Apache Ambari 1.6.0 is a significant step forward. Blueprints facilitate instantiation of a Hadoop cluster quickly and without requiring user interactions. And because Blueprints contain knowledge around service component layout for a particular Stack definition, they preserve best practices across different environments. Blueprints ensure that those best practices for service component layout and configuration are consistently applied across clusters in multiple environments (dev, test, prod) and multiple data centers.
amb3

How Blueprints Work

A blueprint document is in JSON format as shown below. It defines the service components to install on a given host group, the configurations to use, and which Stack to use.

Figure 1 Blueprint Structure
 

"configurations" : [
    {
      "configuration-type" : {
          "property-name"  : "property-value",
          "property-name2" : "property-value"
      }
    },
    {
      "configuration-type2" : {
          "property-name" : "property-value"
      }
    }
    ...
  ],
  "host_groups" : [
    {
      "name" : "host-group-name",
      "components" : [
        {
          "name" : "component-name"
        },
        {
          "name" : "component-name2"
        }
        ...
      ],
      "configurations" : [
        {
          "configuration-type" : {
            "property-name" : "property-value"
          }
        }
        ...
      ],
      "cardinality" : "1"
    }
  ],
  "Blueprints" : {
    "stack_name" : "HDP",
    "stack_version" : "2.1"
  }
}

To create a new cluster, an Ambari Blueprint must be combined with environment-specific host information and configuration within a cluster creation template. A cluster creation template is also in JSON format as shown below. It defines the specific blueprint to follow, the hosts that are to be mapped to a given host group, and the configurations to use.

Figure 2 Cluster Creation Template Structure
 

{
  "blueprint" : "blueprint-name",
  "default_password" : "super-secret-password",
  "configurations" : [
    {
      "configuration-type" : {
        "property-name" : "property-value"
      }
    }
    ...
  ],
  "host_groups" :[
    {
      "name" : "host-group-name",
      "configurations" : [
        {
          "configuration-type" : {
            "property-name" : "property-value"
          }
        }
      ],
      "hosts" : [
        {
          "fqdn" : "host.domain.com"
        },
        {
          "fqdn" : "host2.domain.com"
        }
        ...
      ]
    }
    ...
  ]
}

Configurations defined in a cluster creation template will override any duplicate configurations specified at a blueprint level when the cluster is created. Not all configurations have valid defaults. Therefore, the Blueprint user must provide the required properties. For example, Ambari validates non-password required properties at blueprint creation time and required password properties at cluster creation time.

Monitor Cluster Progress and Inspection via Blueprint API

Once you have those blueprint assets, then it’s time to call the API. Ambari will return a request href for you to “watch” the progress of all the installing and configuration tasks.

Figure 3 Blueprint API Calls

rest_api

Ambari Server provides the following API resources for managing blueprints and creating clusters from them.

Figure 4 Basic API Resources

Screen Shot 2014-05-30 at 3.48.16 PM

Ambari Blueprints address the need to codify and enforce best practices when deploying Hadoop clusters within the enterprise. They store environment specific information and configuration separately and can easily manage consistent deployment across clusters of all sizes. Finally, the API also removes the administrator from the provisioning process and fully automates the steps involved with creating a cluster.

Three Steps to Get Started

  1. Grab the Ambari 1.6.0 release
  2. Dig into the Ambari Blueprints wiki
  3. Start building new clusters!

Categorized by :
Administrator Ambari CIO & ITDM Data Management Developer New Features Operations & Management Other

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.