cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
July 30, 2014
prev slideNext slide

Searching for the Apache Hadoop Provisioning Swiss Army Knife

SequenceIQ provides an API and platform to build predictive applications and turn data into tangible assets. In this guest blog, SequenceIQ Co-founder and CTO Janos Matyas (@sequenceiq), explains why his team chose Apache Ambari for provisioning Hadoop clusters and how they contributed to the Ambari project.

At SequenceIQ, we frequently provision Hadoop clusters on different environments. For a long time, we searched for the right provisioning and management tool.

In this blog post we’ll describe why we chose Apache Ambari for provisioning and configuration and how we contributed to the Apache Ambari project.

Why We Chose Ambari

We are building an open source, cloud agnostic, Docker-based, Hadoop-as-a-Service API called Cloudbreak. In order to bring up dynamic Apache Hadoop clusters, we needed a provisioning tool.
After reviewing all available cluster management alternatives, we chose Apache Ambari. While Ambari brings multiple benefits—and there have been many good posts about this—for us the most important key points were:

  • 100% open source under Apache 2 license
  • Highly active and agile development time
  • Extensive provisioning functionality
  • Available REST API
  • Support of blueprints
  • Ability to add custom stacks

The Contribution Process

We are a company with a very strong focus on DevOps, so we always automate everything and try to use CLI/shells. Once we made the strategic decision to use Apache Ambari, the first thing we looked for was a command line shell (and a REST client to be used from Java/Scala). We soon realized that it was missing.
We quickly engaged with the Apache Ambari community and a few engineers from Hortonworks to present our idea. Once we agreed on the details, we broke those down into requirements for tracking in the Apache Jira. From there the process accelerated.

We appreciated the clean, thorough and well-documented Apache Ambari REST API. Since the shell and Groovy-based REST client are built on its foundation, it made development easier. Nevertheless, some questions arose which the community quickly answered on the mailing lists.

The contribution process (at a high level) is identical to most of the Apache projects. The details are described extensively on the Ambari Project WIKI, but here are the highlights:

  1. Create a JIRA issue and discuss it with the community
  2. Fork the GitHub repository
  3. Write your code/contribution
  4. Create test(s) for the new code
  5. Create documentation
  6. Fill a patch
  7. Follow up with your JIRA issue

A Quick Introduction to the Ambari Shell

We set the goal for the Apache Ambari shell to provide an interactive command line tool that supports all functionality available through the REST API or Ambari web UI.

The shell enables complete automation of management tasks via scripts as well as features most devops engineers expect from a command shell, such as:
• context-aware commands
• tab completion
• required/optional parameter support
• hint command to guide you on the usual path

Download Apache Ambari 1.6.1, install it, and you are ready to go.

Connect Ambari Shell to the Server

Once the server is up and running (in 10-20 sec) you can connect to it with the shell:
Usage:
java -jar ambari-shell.jar : Starts Ambari Shell in interactive mode.
java -jar ambari-shell.jar --cmdfile={FILE} : Ambari Shell executes commands read from the file.
Options:
--ambari.host={HOSTNAME} Hostname of the Ambari Server [default: localhost].
--ambari.port={PORT} Port of the Ambari Server [default: 8080].
--ambari.user={USER} Username of the Ambari admin [default: admin].
--ambari.password={PASSWORD} Password of the Ambari admin [default: admin].

Note: At least one option is mandatory.

Create a Cluster

All commands are context-aware and available only when it makes sense. For example, the cluster create command is not available until a blueprint has been added or selected. A good approach is to use the hint command for hints about the available commands and the flow of creating or configuring a cluster. You can always use TAB for completion or available parameters.

Initially, you can add blueprints from a file or a URL. For your convenience, we’ve added two blueprints as defaults. You can get these blueprints by using the blueprint defaults command, as show below:
ambari-shell> blueprint defaults
ambari-shell> blueprint list

And the output from the above command:

BLUEPRINT STACK
multi-node-hdfs-yarn HDP:2.0
single-node-hdfs-yarn HDP:2.0

Once the blueprints are added, you can use them to create a cluster by typing:
cluster build --blueprint single-node-hdfs-yarn

Now that the blueprint is selected, you have to assign the hosts to the available host groups.
To that end use:
ambari-shell> cluster build --blueprint single-node-hdfs-yarn CLUSTER_BUILD:single-node-hdfs-yarn> cluster assign --hostGroup host_group_1 --host server.ambari.com

This above command’s output is shown below:

HOSTGROUP HOST
host_group_1 server.ambari.com

Once you are happy with the host and host group associations, you can choose cluster create to start building the cluster. You can check progress either with the Amabri UI or using the tasks command.

Here is an example of the type of progress messages you will see.

CLUSTER_BUILD:single-node-hdfs-yarn> cluster create

Successfully created the cluster

CLUSTER:single-node-hdfs-yarn> tasks

The output from cluster create and task is shown below:

TASK STATUS
HISTORYSERVER INSTALL QUEUED
ZOOKEEPER_SERVER START PENDING
ZOOKEEPER_CLIENT INSTALL PENDING
HDFS_CLIENT INSTALL PENDING
HISTORYSERVER START PENDING
NODEMANAGER INSTALL QUEUED
NODEMANAGER START PENDING
ZOOKEEPER_SERVER INSTALL QUEUED
YARN_CLIENT INSTALL PENDING
NAMENODE INSTALL QUEUED
RESOURCEMANAGER INSTALL QUEUED
NAMENODE START INSTALL QUEUED
RESOURCEMANAGER START PENDING
DATANODE START PENDING
DATANODE START PENDING
SECONDARY_NAMENODE START PENDING
SECONDARY_NAMENODE START PENDING
DATANODE INSTALL QUEUED
MAPREDUCE2_CLIENT INSTALL PENDING
SECONDARY_NAMENODE INSTALL QUEUED

Each time you start the shell, the executed commands are logged in a file. Later, you can execute the same commands again, either with the script command or by specifying a cmdfile option

Ambari Shell Commands

These commands are currently supported:

Command Description
blueprint add Add a new blueprint with either –url or –file
blueprint defaults Adds the default blueprints to Ambari
blueprint list Lists all known blueprints
blueprint show Shows the blueprint by its id
cluster assign Assign host to host group
cluster build Starts to build a cluster
cluster create Create a cluster based on current blueprint and assigned hosts
cluster delete Delete the cluster
cluster preview Shows the currently assigned hosts
cluster reset Clears the host – host group assignments
debug off Stops showing the URL of the API calls
debug on Shows the URL of the API calls
exit Exits the shell
hello Prints a simple elephant to the console
help List all commands usage
hint Shows some hints
host components Lists the components assigned to the selected host
host focus Sets the useHost to the specified host
host list Lists the available hosts
quit Exits the shell
script Parses the specified resource file and executes its commands
service components Lists all services with their components
service list Lists the available services
tasks Lists the Ambari tasks
version Displays shell version

What’s Next

As our Cloudbreak project evolves, we are constantly adding new features and upgrading the Apache Ambari shell and REST client. Here’s what we’re planning to contribute next:

  • Add new node to the cluster and install host components on it
  • decommission nodes and completely remove from the cluster
  • Modify configuration

If you’d like to have new features in the shell, please submit a Jira task. We strongly encourage you to collaborate and join the Apache Ambari team.

Summary

To sum it up in less than two minutes watch this video:

Tags:

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>