On June 7, we hosted the Next Gen Data Analytics Powered by Cloud webinar with speakers from Hortonworks, Jeff Sposetti and Amazon Web Services, Karthik Krishnan. The webinar provided an overview on how your organization can achieve the benefits of data analytics with the cloud, how to use the AWS marketplace for ease of deployment, and how to leverage Hortonworks cloud solutions for data science and analytics needs. Jeff and Karthik provided demos to show how easy it is to get started in the cloud. To get access to the slides, go here.
Some great questions came across during the webinar. As promised, here is a brief capture of that Q&A.
A: A cluster, used for storing and processing data, includes two node types: master and worker.
A: We have several recommendations to help you get started with a videos and tutorials.
Watch this How to Video. This 3 minute video shows you how to launch Hortonworks Data Cloud in the AWS marketplace. It specifically focuses on creating a data science cluster using Apache Spark and Apache Zeppelin.
We also have several tutorials to try that will show you how to set up a cluster within about 15 minutes.
Intro to Machine Learning with Apache Spark and Apache Zeppelin. In this tutorial, we will introduce you to Machine Learning with Apache Spark using HDCloud for AWS.
Learning Spark SQL with Zeppelin. This first part of this tutorial will introduce you to Apache Spark SQL using HDCloud. The second part of the lab is part of our Apache Zeppelin based lab series, providing an intuitive and developer friendly web-based environment for data ingestion, visualization and more.
A: You can scale clusters on-demand (add/remove) or via auto-scaling (as well as terminate clusters when you done with your processing). This is all available via the UI or CLI (if you want to automate those operations).
A: There is a cost for the “Controller” and the cost for running the HDP clusters. The cluster run “HDP Services” are based on the type of cluster you set up (for example: Data Science, ETL and Interactive Analytics). Learn more about pricing for the two components here:
Controller Marketplace Listing
HDP Services Marketplace Listing
A: The clusters are tailored for ephemeral workloads where you spin-up, do your processing, and spin-down. By leveraging Amazon S3 for your long-term storage, you don’t have to pause the cluster. Instead, terminate and recreate when you need to start processing again. Once you do terminate the cluster though, the billing meter for cluster resources will cease.
A: Upgrades of the Controller service are handled by terminating the controller instance and re-instantiating a new instance the points to the previous controller resources (e.g. Amazon RDS). The upgrade is handled automatically on that launch.
A: As a matter of fact, we just announced a Hortonworks Data Cloud 2.0 Technical Preview which
includes “Shared Data Lake Services” for consistent Authentication, Authorization and Audit controls across ephemeral workloads. Read more here: https://hortonworks.com/blog/plenty-hortonworks-data-cloud/
If you didn’t get a chance to watch the webinar, you can checkout the replay here:
For more information on Hortonworks Data Cloud for AWS, go here.
To get started with a 5-day free trial on the marketplace, go here.