Elastic Hadoop on OpenStack
Apache Hadoop and OpenStack represent two of the largest open source communities and both are relatively new to the data center. Hadoop can benefit from the operational agility provided by OpenStack and it serves as an excellent use case for OpenStack.
To accelerate the adoption of Hadoop over OpenStack, we partnered with Mirantis and Red Hat to collaborate on Project Savanna (since renamed to Project Sahara).
Our initiative targets the following use cases:
- One-Click Provisioning
- Enable self-service provisioning for frequent requests
- Simplify migrations from development to production
- Reduce operator error in provisioning
- Facilitate migration from Amazon EMR for ad-hoc analytics
- Vary cluster compute capacity based on factors like time of day, resource utilization, user job requirements etc
- Provide transient Hadoop clusters for analyzing data stored in Swift object store
- Simplify upgrade and maintenance by running multiple Hadoop versions over common server pools
- Improve server utilization by sharing resources with non-Hadoop workloads
- Simplify chargeback/showback
The core of Sahara called the ‘controller’ serves as the glue between Hadoop and OpenStack. It manages the provisioning and orchestration of virtual machines by working with the underlying OpenStack projects like Nova, Quantum, Cinder and Glance. The Hortonworks OpenStack plugin for Sahara will configure and manage the Hadoop cluster using Ambari. It will also set up the HDFS and swift object store connectors.
Project Sahara is currently under incubation in the OpenStack community. Hortonworks is working with the community to help mature Sahara to become a top-level OpenStack project. The most recent version of Sahara is 0.3, released on Oct 17th 2013.
Since the HDP OpenStack plugin is being developed in the open community, it has been included in Project Sahara since the 0.2 release. With the current version, users can provision a simple HDP cluster over OpenStack to run basic MapReduce jobs and use Ambari to manage the clusters.
We are targeting the HDP OpenStack plugin to be generally available in Q1-2014.
- Provisioning & Management
- Template-based self-provisioning
- Job flow based provisioning (Savanna EDP)
- Ambari-based cluster management
- OpenStack HEAT support
- Manual compute & data node elasticity
- OpenStack Swift to HDFS data movement support
- VM-based CPU, memory & I/O isolation
- OpenStack Neutron support for network isolation
- Dedicated Ambari & Hue per cluster
- Savanna template to Ambari template conversion
- Additional data sources and job types with Savanna EDP
- Automatic rule-based cluster elasticity
- Improved OpenStack Ceilometer integration
- Single Ambari instance per tenant with multi-cluster support
- OpenStack Horizon to Ambari single sign-on
- Hadoop node VM to physical server pinning
- Project Savanna announcement
- [VIDEO] Savanna Demo (Hadoop Summit 2013)
- [SLIDES] Savanna Demo (Hadoop Summit 2013)
- [WIKI] Savanna Project
- [DOWNLOAD] Github Repos