Cluster Operations

Simplified & Consistent Hadoop Operations for Enterprise IT Teams

An Apache Hadoop cluster presents a new challenge to IT operators.  It is a collection of a handful (or thousands) of machines, all working collectively on solving a problem at scale. Initial provisioning could be difficult even with only a handful of nodes. Ongoing management and monitoring of the environment requires complex networking of resources and software.

With hundreds of years of combined experience, members of the Hadoop community have answered the call to deliver the key services required for enterprise Hadoop. Now they have rallied again to solve operational challenges.

At Hortonworks, we are helping to lead this effort within the community and completely in the open. We believe that the best experience for provisioning, managing, and monitoring Hadoop clusters should be available for everyone, not as an optional extra, but as a core requirement for integrating Hadoop with existing IT technologies and operations.

Initiative Goals

Deliver a complete set of features for Hadoop operations, in public and with the community, by defining the operational framework and lifecycle.
Ensure that Hadoop operations can be integrated with existing IT tools, behind a single pane of glass, by providing REST APIs and multiple views of the cluster.
Make Hadoop’s most complex operational challenges easy to manage with more insight and visibility into cluster performance.

Already Delivered

Hortonworks introduced Apache Ambari to open source in 2011. The goal was to create a single framework for IT administrators that they could use to easily provision, manage and monitor Hadoop clusters. Since the beginning, the Ambari team has focused on making it easy to integrate other technologies with Apache Hadoop.

Hortonworks has invested heavily in the Apache Ambari and Apache ZooKeeper projects, and we have been joined by many folks in the community, from enterprise IT contributors to large ISVs that recognize Ambari as the operational plug-point into the Hadoop ecosystem.

Now Ambari supports Hadoop 2 and its YARN-based architecture. It integrates with Kerberos for security and supports Hadoop High Availability. Most recently, the community delivered heterogeneous closer configurations and flexible component controls with rolling restarts to minimize cluster downtime during maintenance.

Coming Next

The following features are next on the Cluster Operations roadmap:

  • Ubuntu support
  • Install, manage and monitor Apache Flume agents
  • Customizable web experience
  • Configuration history and rollback
Phase 1
  • Support for Hadoop 2 and YARN
  • High Availability
  • Kerberos Cluster Security
(HDP 2.0)
Phase 2
Ambari 1.5.1, 1.6.0 & 1.6.1(HDP 2.1)
  • New Data Access Engines
  • Stack Extensibility
  • Cluster Blueprints
  • Rolling Restarts
  • Maintenance Mode
  • Custom JDK Checking
  • Simplified Database Connections
  • Blueprints Topology Validation
Phase 3
  • Expanded Platform Support
  • Apache Flume Agents
  • Customizable Web Experience
  • Configuration History & Rollback
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.