Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics, offering information and knowledge of the Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
HDP > Develop with Hadoop > Apache Spark

Getting Started with HDCloud

cloud Ready to Get Started?

DOWNLOAD SANDBOX

Introduction

This tutorial will help you quickly spin-up a cloud environment where you can dynamically resize your cluster from one to hundreds of nodes. HDCloud is ideal for short-lived on-demand processing, allowing you to quickly perform heavy computation on large datasets. It gives you the ultimate control to allocate and de-allocate resources as needed.

In this tutorial we will focus on spinning up a Data Science persona environment that’s ideally suited to our Apache Spark and Apache Zeppelin cloud-based tutorial series.

Getting Started Videos

For a quick overview on how to get started with HDCloud, checkout these short three-part videos:

Part 1 of 3 – Setting up HDCloud Controller

Part 2 of 3 – Setting up a three-node Cluster

Part 3 of 3 – Launching Apache Ambari for operations and Apache Zeppelin for data wrangling and advanced analytics

Environment Setup Details

Below are detailed steps behind the getting started videos:

1a. Create an Amazon Web Services (AWS) Account if you don’t have one

1b. Follow this step-by-step doc to Setup and Launch a Controller on HDCloud

1c. Create a Data Science Cluster (use settings listed below)

Select/specify the following for your cluster:

  • HDP Version: HDP 2.6 or later
  • Cluster Type: “Data Science: Apache Spark 2.1+, Apache Zeppelin 0.6.2+” or later
  • Worker instance count: one or more
  • Remote Access: 0.0.0.0/0

Here’s a screenshot with sample settings:

setting-up-hd-cloud

Next Steps

Now that you have your HDCloud environment set-up, checkout one of these cloud-ready tutorials:

  1. Spark in 5 Minutes tutorial analyzing a Silicon Valley film series dataset

  2. A more in-depth tutorial on Spark SQL