This blog was also authored by By Ryan Peterson, Head of Global Data Segment at AWS
Big Data and Cloud technologies are accelerating enterprise data insights. Scott Gnau, Chief Technology Officer at Hortonworks points out in his recent Hortonworks blog: “Successful businesses understand how interconnected data and cloud strategies are key to a cohesive business strategy.” A well-executed combination of Big Data and Cloud technologies will enable quick access to data, when users need it with the tools they want to make effective “data driven” business decisions.
Hortonworks is an AWS Partner Network (APN) Advanced Technology Partner with Data & Analytics Competency status, a Public Sector Partner and ISV Migration Partner. Hortonworks platforms provide data processing & management capabilities with Hortonworks Data Platform (for data at-rest), Hortonworks DataFlow (for data in-motion and at the edge) and Hortonworks DataPlane (for consistent data security, governance and operations across a hybrid architecture).
Cloudbreak allows the easy deployment of Hortonworks platforms on AWS. Cloudbreak simplifies the complex technical setup and service management transparently for you, which significantly reduces the effort to deploy and run data workloads that leverage AWS services.
Cloudbreak gives you a guided experience to get data workload environments configured & running on AWS, and the ability to customize those environments based on your specific needs. The end state is an optimized, performant and secure deployment of your data workload on AWS.
After installing the Cloudbreak application, you need to configure an AWS credential in Cloudbreak to allow Cloudbreak to authenticate with your AWS account and provision resources on your behalf. Cloudbreak supports two types of cloud credential authentication for AWS: Key-based and Role-based.
Once your cloud credentials are configured, you are ready to create a workload cluster. Cloudbreak provides a set of workload configurations out-of-the-box.
Cloudbreak exposes networking options to create your workload cluster in a new VPC or in an existing VPC. A best practice when running in AWS includes configuring a protected gateway around the cluster to minimize the network surface area and by having a common access point to access cluster resources. Cloudbreak can automate the setup of the gateway (powered by Apache Knox) during cluster create.
You can configure your workload clusters to access data in S3 by specifying an instance profile for the Amazon EC2 instances. You can optionally create or attach an existing instance profile during cluster creation as well. Cloudbreak helps configure the most common storage location settings such as where to store Apache Ranger audit logs and the default Hive Warehouse Directory location.
Once you have your cluster up and running, you can always tune the use of AWS resources by scaling the cluster up or down, either manually or automatically.
Many customers are successfully running Hortonworks platforms on AWS to enable quick access to data, when they need it with the tools they want to make effective data driven business decisions. Hilton is one such customer that is successfully leveraging a modern data architecture of Hortonworks Data Platform and Hortonworks DataFlow, running on a private cloud on AWS.