Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
November 13, 2018
prev slideNext slide

Easily Deploy Hortonworks on Amazon Web Services to Enable a Data Driven Business

This blog was also authored by By Ryan Peterson, Head of Global Data Segment at AWS

Big Data and Cloud technologies are accelerating enterprise data insights. Scott Gnau, Chief Technology Officer at Hortonworks points out in his recent Hortonworks blog: “Successful businesses understand how interconnected data and cloud strategies are key to a cohesive business strategy.” A well-executed combination of Big Data and Cloud technologies will enable quick access to data, when users need it with the tools they want to make effective “data driven” business decisions.

Hortonworks is an AWS Partner Network (APN) Advanced Technology Partner with Data & Analytics Competency status, a Public Sector Partner and ISV Migration Partner. Hortonworks platforms provide data processing & management capabilities with Hortonworks Data Platform (for data at-rest), Hortonworks DataFlow (for data in-motion and at the edge) and Hortonworks DataPlane (for consistent data security, governance and operations across a hybrid architecture).

Running Hortonworks on AWS

Cloudbreak allows the easy deployment of Hortonworks platforms on AWS. Cloudbreak simplifies the complex technical setup and service management transparently for you, which significantly reduces the effort to deploy and run data workloads that leverage AWS services.

Cloudbreak gives you a guided experience to get data workload environments configured & running on AWS, and the ability to customize those environments based on your specific needs. The end state is an optimized, performant and secure deployment of your data workload on AWS.

  • Prescriptive data workload setups
  • Simplified choices of infrastructure and supporting AWS services
  • Built-in integrations to work with Amazon S3
  • Critical options for workload security including the network perimeter

Learn more and get started using the Cloudbreak AWS guide >>

Getting Started with Cloudbreak

After installing the Cloudbreak application, you need to configure an AWS credential in Cloudbreak to allow Cloudbreak to authenticate with your AWS account and provision resources on your behalf. Cloudbreak supports two types of cloud credential authentication for AWS: Key-based and Role-based.

  • Key-based: This is a simpler option which does not require additional configuration. It requires that you provide your AWS access key and secret key pair in the Cloudbreak web UI later. All you need to do now is check your AWS account and ensure that you can access this key pair to request AWS resources.
  • Role-based: This requires that you or your AWS admin create an IAM role to allow Cloudbreak to assume AWS roles (the “AssumeRole” policy). Cloudbreak will assume this role when requesting AWS resources.

Creating a Workload Cluster

Once your cloud credentials are configured, you are ready to create a workload cluster. Cloudbreak provides a set of workload configurations out-of-the-box.

  • EDW Analytics: Useful for EDW and SQL analytics using Apache Hive LLAP
  • Data Science: Useful for data science with Apache Spark and Apache Zeppelin
  • ETL: Useful for ETL job processing with Apache Hive and Apache Spark
  • Flow Management: Useful for flow management with Apache NiFi
  • Messaging: Useful for messaging management with Apache Kafka

Network and Gateway Configurations

Cloudbreak exposes networking options to create your workload cluster in a new VPC or in an existing VPC. A best practice when running in AWS includes configuring a protected gateway around the cluster to minimize the network surface area and by having a common access point to access cluster resources. Cloudbreak can automate the setup of the gateway (powered by Apache Knox) during cluster create.

Accessing Amazon S3 Data

You can configure your workload clusters to access data in S3 by specifying an instance profile for the Amazon EC2 instances. You can optionally create or attach an existing instance profile during cluster creation as well. Cloudbreak helps configure the most common storage location settings such as where to store Apache Ranger audit logs and the default Hive Warehouse Directory location.

Scaling and Sizing

Once you have your cluster up and running, you can always tune the use of AWS resources by scaling the cluster up or down, either manually or automatically.

Enabling Customer Success

Many customers are successfully running Hortonworks platforms on AWS to enable quick access to data, when they need it with the tools they want to make effective data driven business decisions. Hilton is one such customer that is successfully leveraging a modern data architecture of Hortonworks Data Platform and Hortonworks DataFlow, running on a private cloud on AWS.

Next Steps

  • Visit our partner page to learn more about the Hortonworks and AWS partnership.
  • Read the Cloudbreak AWS guide on how to get started with Hortonworks on AWS.
  • Stop by the Hortonworks at booth #629 in the Sands Expo Center at the Venetian during AWS re:Invent 2018 to learn more about Hortonworks, Cloudbreak and the exciting combination of Big Data and AWS.

Leave a Reply

Your email address will not be published. Required fields are marked *