Hadoop on OpenStack
Apache Hadoop and OpenStack represent two of the largest open source communities and both are relatively new to the data center. Hadoop can benefit from the operational agility provided by OpenStack and it serves as an excellent use case for OpenStack.
To accelerate the adoption of Hadoop over OpenStack, we partnered with Mirantis and Red Hat to collaborate on Project Savanna (since renamed to Project Sahara).
Our initiative targets the following use cases:
- One-Click Provisioning
- Enable self-service provisioning for frequent requests
- Simplify migrations from development to production
- Reduce operator error in provisioning
- Facilitate migration from Amazon EMR for ad-hoc analytics
- Vary cluster compute capacity based on factors like time of day, resource utilization, user job requirements etc
- Provide transient Hadoop clusters for analyzing data stored in Swift object store
- Simplify upgrade and maintenance by running multiple Hadoop versions over common server pools
- Improve server utilization by sharing resources with non-Hadoop workloads
- Simplify chargeback/showback
The core of Sahara called the ‘controller’ serves as the glue between Hadoop and OpenStack. It manages the provisioning and orchestration of virtual machines by working with the underlying OpenStack projects like Nova, Quantum, Cinder and Glance.
Hortonworks is developing (in the OpenStack community) a plugin for Project Sahara that leverages Apache Ambari to configure and manage Hadoop clusters in the cloud. The HDP Plugin also configures HDFS and Swift object store connectors.
Project Sahara is currently under incubation in the OpenStack community. Hortonworks is working with the community to help mature Sahara to become an integrated OpenStack project in the Juno release cycle.
The HDP Plugin is under development in the OpenStack community and has been included with Project Sahara since the 0.3 release. With the current HDP Plugin, users can provision a HDP cluster on OpenStack and manage the cluster with Apache Ambari.
The most recent version of Project Sahara is 0.3, released on Oct 17th 2013.
Visit these sites to learn more about Project Sahara:
- Provisioning & Management
- Template-based self-provisioning
- Elastic Data Processing
- Ambari-based cluster management
- HEAT and Nova-based provisioning
- Manual compute & data node elasticity
- OpenStack Swift to HDFS data movement support
- VM-based CPU, memory & I/O isolation
- OpenStack Neutron support for network isolation
- Dedicated Ambari per cluster
- Provisioning & Management
- Native support for Ambari Blueprints
- Command line interface
- Kerberos cluster support
- Platform Support
- Support for HDP 2.1 Stack
- Data Worker
- Support for Ambari Views
- Additional Elastic Data Processing capabilities