Cluster Operations

Simplified & Consistent Hadoop Operations for Enterprise IT Teams

An Apache Hadoop cluster presents a new challenge to IT operators.  It is a collection of a handful (or thousands) of machines, all working collectively on solving a problem at scale. Initial provisioning could be difficult even with only a handful of nodes. Ongoing management and monitoring of the environment requires complex networking of resources and software.

With hundreds of years of combined experience, members of the Hadoop community have answered the call to deliver the key services required for enterprise Hadoop. Now they have rallied again to solve operational challenges.

At Hortonworks, we are helping to lead this effort within the community and completely in the open. We believe that the best experience for provisioning, managing, and monitoring Hadoop clusters should be available for everyone, not as an optional extra, but as a core requirement for integrating Hadoop with existing IT technologies and operations.

Initiative Goals

Open
Deliver a complete set of features for Hadoop operations, in public and with the community, by defining the operational framework and lifecycle.
Integrated
Ensure that Hadoop operations can be integrated with existing IT tools, behind a single pane of glass, by providing REST APIs and multiple views of the cluster.
Intuitive
Make Hadoop’s most complex operational challenges easy to manage with more insight and visibility into cluster performance.

Already Delivered

Hortonworks introduced Apache Ambari to open source in 2011. The goal was to create a single framework for IT administrators that they could use to easily provision, manage and monitor Hadoop clusters. Since the beginning, the Ambari team has focused on making it easy to integrate other technologies with Apache Hadoop.

Hortonworks has invested heavily in the Apache Ambari and Apache ZooKeeper projects, and we have been joined by many folks in the community, from enterprise IT contributors to large ISVs that recognize Ambari as the operational plug-point into the Hadoop ecosystem.

Now Ambari supports Hadoop 2 and its YARN-based architecture. It integrates with Kerberos for security and supports Hadoop High Availability. Most recently, the community delivered heterogeneous closer configurations and flexible component controls with rolling restarts to minimize cluster downtime during maintenance.

Coming Next

The following features are next on the Cluster Operations roadmap:

  • Blueprints for repeatable and consistent cluster installations
  • Install, manage and monitor Apache Flume agents
  • Ubuntu support
  • Configuration history and rollback
Phase 1
  • Support for Hadoop 2 and YARN
  • High Availability
  • Kerberos Cluster Security
Delivered
(HDP 2.0)
Phase 2
Delivered
Ambari 1.5.1 & 1.6.0(HDP 2.1)
  • New Data Access Engines
  • Stack Extensibility
  • Cluster Blueprints
  • Rolling Restarts
  • Maintenance Mode
Phase 3
  • Expanded Platform Support
  • Apache Flume Agents
  • Centralized Log Access and Search
  • Historic View of Cluster Performance
  • Capacity Planning Tools
Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
HDP 2.1 Webinar Series
Join us for a series of talks on some of the new enterprise functionality available in HDP 2.1 including data governance, security, operations and data access :