Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
June 22, 2017
prev slideNext slide

Top Questions – Next Gen Data Analytics Powered by Cloud webinar from June 7 – Hortonworks and AWS

On June 7, we hosted the Next Gen Data Analytics Powered by Cloud webinar with speakers from Hortonworks, Jeff Sposetti  and Amazon Web Services, Karthik Krishnan. The webinar provided an overview on how your organization can achieve the benefits of data analytics with the cloud, how to use the AWS marketplace for ease of deployment, and how to leverage Hortonworks cloud solutions for data science and analytics needs. Jeff and Karthik provided demos to show how easy it is to get started in the cloud. To get access to the slides, go here.

Some great questions came across during the webinar. As promised, here is a brief capture of that Q&A.

  1. What is the difference between worker node and compute node?

A:  A cluster, used for storing and processing data, includes two node types: master and worker.

  • A Master node runs the components for managing the cluster (including Ambari), storing temporary/intermediate data (HDFS) and processing tasks (YARN) as well as other master components.
  • Worker nodes run the components that are used for executing processing tasks (e.g. NodeManager) and handling temporary storing data in HDFS (e.g. DataNode).
  • Compute nodes do just what it sounds like: allows you to include nodes in the cluster that are for compute work. You can optionally include Compute nodes in your cluster (in addition to Worker nodes) to expand your workload processing power independent of storage.
  1. Can you give me recommendations on how to setup an AWS instance with Hortonworks platform for my students?

A: We have several recommendations to help you get started with a videos and tutorials.

Watch this How to Video. This 3 minute video shows you how to launch Hortonworks Data Cloud in the AWS marketplace. It specifically focuses on creating a data science cluster using Apache Spark and Apache Zeppelin.

We also have several tutorials to try that will show you how to set up a cluster within about 15 minutes.

Intro to Machine Learning with Apache Spark and Apache Zeppelin. In this tutorial, we will introduce you to Machine Learning with Apache Spark using HDCloud for AWS.

Hands-on Tour of Apache Spark in 5 Minute In this tutorial, we will use an Apache Zeppelin notebook for our development environment to keep things simple and elegant using HDCloud for AWS.

Learning Spark SQL with Zeppelin. This first part of this tutorial will introduce you to Apache Spark SQL using HDCloud.  The second part of the lab is part of our Apache Zeppelin based lab series, providing an intuitive and developer friendly web-based environment for data ingestion, visualization and more.

  1. How easy is it to scale a cluster on AWS? Does this auto scale, auto shutdown and auto resume?

A: You can scale clusters on-demand (add/remove) or via auto-scaling (as well as terminate clusters when you done with your processing). This is all available via the UI or CLI (if you want to automate those operations).

  1. Is that pricing for the Controller only? What is the cost/hr?

A: There is a cost for the “Controller” and the cost for running the HDP clusters. The cluster run “HDP Services” are based on the type of cluster you set up (for example: Data Science, ETL and Interactive Analytics). Learn more about pricing for the two components here:

Controller Marketplace Listing

HDP Services Marketplace Listing

  1. Also, can you pause the cluster when its not in use? May I suspend all the nodes in a cluster for the night and then come back in the morning and restart them all?

A: The clusters are tailored for ephemeral workloads where you spin-up, do your processing, and spin-down. By leveraging Amazon S3 for your long-term storage, you don’t have to pause the cluster. Instead, terminate and recreate when you need to start processing again. Once you do terminate the cluster though, the billing meter for cluster resources will cease.

  1. How are product updates/upgrades handled for the Controller Service? And how are the HDP Stacks managed?

A: Upgrades of the Controller service are handled by terminating the controller instance and re-instantiating a new instance the points to the previous controller resources (e.g. Amazon RDS). The upgrade is handled automatically on that launch.

  1. Are there plans for adding other security features like Apache Ranger and Apache Atlas in the solution?

A:  As a matter of fact, we just announced a Hortonworks Data Cloud 2.0 Technical Preview which

includes “Shared Data Lake Services” for consistent Authentication, Authorization and Audit controls across ephemeral workloads. Read more here:


If you didn’t get a chance to watch the webinar, you can checkout the replay here:

Next Gen Data Analytics Powered by Cloud

For more information on Hortonworks Data Cloud for AWS, go here.

To get started with a 5-day free trial on the marketplace, go here.



Leave a Reply

Your email address will not be published. Required fields are marked *