Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
December 03, 2014
prev slideNext slide

Hybrid Deployment Options for Hadoop and HDP

Our customers have many choices of infrastructure to deploy HDP: on premise, cloud, virtualized and even as an appliance. Further, our customers have a choice of deploying on Linux and Windows operating systems. You can easily see this creates a complex matrix. At Hortonworks, we believe you should not be limited to just one option but have the option to choose the best combination of infrastructure and operating system based on the usage scenario. That means: in a hybrid deployment model, you should have all of these options.

hdp_deplohymentWhy would an organization use a hybrid deployment model to deploy HDP and Hadoop? Our customers come to us asking to meet the requirements for their organizations for the following three basic scenarios:

  • Cluster Backup

    Data architects require Hadoop to act like other systems in the data center, and business continuity through replication across on-premises and cloud-based storages targets is a critical requirement. In HDP 2.2 we extend the capabilities of Apache Falcon to establish an automated policy for cloud backup to Microsoft Azure or Amazon S3. For example, this tutorial shows how to incrementally backup data to Microsoft Azure using Apache Falcon.

  • Development

    A development environment is always separate from the production environment. And today, many organizations are relying on a cloud-based option for their development teams. It allows them manage multiple environments more easily and also to spin up temporary environments to a full or a short-term development requirement. As a hybrid option, you need to be able to port not just data but the Hadoop apps as well.

  • Burst

    Data Science continues to be a large interest within many of the organization we help with Hadoop. With Apache Hadoop YARN acting as a data operating system for an in production cluster, some want to spin up a temporary cluster (on premise or cloud) to perform some sort of exploration of data via machine learning and in order to do so will need data and some of the application logic from their existing production Hadoop environment.

It’s all about Portability

In all three of these deployment models, the key to making it work is portability. You need to be able to not only move data back and forth, but to also synchronize data sets. Further and even more complex is the consistency of the “bits” across environments. The same version of the entire Hadoop stack must be deployed in the environments or else you risk a job execution failing as it is migrated from one to the next. This portability is a CRITICAL requirement for hybrid deployment of Hadoop.

Ok, it’s all about Portability and ease of management

Setting up a cluster is not a simple task. There are hundreds of options that not only allow you to deploy different options within the stack, but also configuration settings that will optimize your cluster for your particular use.

Two new features in Apache Ambari provide you with a very broad set of options to simplify deployment not just within the cloud but on premise as well.

  • Ambari Blueprints

    Ambari Blueprints makes it easy to take a template of one cluster and apply it to another for seamless portability. With a Blueprint, you specify the version of HDP, the Component layout and the Configurations to materialize a Hadoop cluster instance (via a REST API) without any user interaction.

  • Ambari Stacks

    The “stack” for a cluster is defined by a set of components that are running in the environment. This might comprise of Hadoop, Pig and Hive (and more). It is typically a fairly complex list and can even be extended to non-Apache projects. With Apache Ambari you can define a stack and have the same definition deployed across environments.

Making it simple with HDP

Only HDP provides the wide array of options necessary to deploy the same bits across operating systems and environments. Further, we have gone to great lengths to automate the movement of data and to manage each of these environments. You can download HDP today or try some of these features out in our HDP sandbox.


Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums