Elastic Hadoop on OpenStack
Hadoop and OpenStack are two of the largest open source communities and are both new to the data center. Hadoop can benefit from the operational agility provided by OpenStack and serves as an excellent use case for OpenStack.
To accelerate the adoption of Hadoop over OpenStack, at the OpenStack summit in Portland, Oregon in April 2013 we announced a partnership with Mirantis and Redhat for collaborating on Project Savanna.
Below are some use cases targeted by the initiative goals:
- One-Click Provisioning.
- Enable self-service provisioning for frequent requests.
- Simplify migrations from development to production.
- Reduce operator error in provisioning.
- Job flow based cluster provisioning to enable migration from Amazon EMR for ad-hoc analytics.
- Vary cluster compute capacity based on factors like time of day, resource utilization, user job requirements etc.
- Provide transient Hadoop clusters for analyzing data stored in Swift object store.
- Simplify upgrade and maintenance by running multiple Hadoop versions over common server pool.
- Improve server utilization by sharing resources with non-Hadoop workloads.
- Simplify chargeback/showback.
The core of Savanna called the ‘controller’ serves as the glue between Hadoop and OpenStack and manages the provisioning & orchestration of virtual machines by working with the underlying OpenStack projects like Nova, Quantum, Cinder & Glance. The Hortonworks OpenStack plugin for Savanna will take care of actually configuring & managing the Hadoop cluster using Ambari. It will also setup the HDFS & swift object store connectors
Project Savanna is currently under incubation in the OpenStack community. Hortonworks is working with the community to help mature Savanna as a top level OpenStack project. The most recent version of project Savanna is 0.3 and was released on Oct 17th.
Since the HDP OpenStack plugin is being developed in the open community, it has been included in Project Savanna since the 0.2 release. With this version that is currently available in the community, users can provision a simple HDP cluster over OpenStack to run basic mapreduce jobs and use Ambari to manage the clusters.
We are targeting the HDP OpenStack plugin to be generally available sometime in Q1-2014.
- Provisioning & Management
- Template-based self-provisioning
- Job flow based provisioning (Savanna EDP)
- Ambari-based cluster management
- OpenStack HEAT support
- Manual compute & data node elasticity
- OpenStack Swift to HDFS data movement support
- VM-based CPU, memory & I/O isolation
- OpenStack Neutron support for network isolation
- Dedicated Ambari & Hue per cluster
- Savanna template to Ambari template conversion
- Additional data sources and job types with Savanna EDP
- Automatic rule-based cluster elasticity
- Improved OpenStack Ceilometer integration
- Single Ambari instance per tenant with multi-cluster support
- OpenStack Horizon to Ambari single sign-on
- Hadoop node VM to physical server pinning
- Project Savanna announcement
- [VIDEO] Savanna Demo (Hadoop Summit 2013)
- [SLIDES] Savanna Demo (Hadoop Summit 2013)
- [WIKI] Savanna Project
- [DOWNLOAD] Github Repos