The recent release of Hortonworks Data Platform 2.6 (“HDP 2.6”) includes Apache Spark 2.1. And Hortonworks Data Cloud (“HDCloud”) for AWS gives you a quick way to launch a Spark cluster. Let’s use the HDCloud release to launch a Data Science cluster powered Spark 2.1 and Zeppelin:
Grab the latest HDCloud Release, launch your Cloud Controller, login and create your cluster that includes Spark 2.1 by selecting HDP 2.6 (Cloud) and choosing the “Data Science: Apache Spark 2.1, Apache Zeppelin .0.7.0” Cluster Type.
During the cluster creation you should also select the check both to enable remote access to cluster components. These component include: HDFS NameNode (NN), YARN Resource Manager (RM), Spark History Server (SHS) & MapReduce Job History Server (JHS).
Pick the default configuration of master & worker EC2 instance type.
The user/password is the same that you used to create the cluster.
Once you are in Zeppelin home page, you can run the Zeppelin Tutorial.
SSH into one of the cluster Worker nodes:
ssh -i "vinay-ec2-us-west.pem" firstname.lastname@example.org
And run the Spark PI example:
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client
--num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1
You can see the completed job in the Spark History Server UI.Spark
To run more Spark examples in HDCloud, visit A Lap around Spark tutorial and try the examples from there in HDCloud.
If you have issues or need help with launching Spark 2.1 or trying out HDCloud, please visit https://community.hortonworks.com/spaces/61/operations-track_2.html?type=question. We’d love to hear from you.
Are you interested in learning how other practitioners and customers are getting the business value from Spark in the cloud? Join us for DataWorks Summit on June 13–15 in San Jose and save 25% off your all-access pass. Enter BLOG when you register.