Apache Spark 2.1 was released recently in the community. The main focus of this release was improvements in Structured Streaming and Machine Learning.
Hortonworks Data Cloud (“HDCloud”) for AWS gives you a quick way to launch a Spark cluster in the cloud. With the latest HDCloud Technical Preview (version #1.12 TP available http://hortonworks.github.io/hdp-aws/), we have added an option for HDP 2.6 (Technical Preview) which includes a new cluster configuration for Spark 2.1 for Data Science workloads. Let’s use this new HDCloud Technical Preview to launch Spark 2.1 and setup Zeppelin:
Grab the HDCloud Technical Preview, launch your Cloud Controller, login and create your cluster that includes Spark 2.1 by selecting HDP 2.6 (Technical Preview) and choosing the Apache Spark 2.1 Cluster Type.
Before we run the Spark PI Example, you’ll want to open access to the Spark History Server UI (which runs on port 18081). By default, HDCloud configures the AWS Security Group to not have access to port 18081, so you will need to open this up from the AWS EC2 console.
From the AWS EC2 console, locate the EC2 instance for the cluster Master node. Click on the Security Group for this instance and edit the Inbound access for this port. For example, I created this rule to allow routing to allow connection to port 18081.
Note: Opening this (or any) port for Inbound access should be done with a bit of caution. We strongly recommend you take great care in limiting Inbound port access, protocols and client IP addresses to prevent malicious agents from gaining access to your data or resources. The above example of an Inbound port rule uses a very wide-open CIDR (0.0.0.0/0) for illustrative purposes. You should look to provide much more restrictive access.
From the cloud controller, browse to the Ambari Web UI for your cluster. Login and navigate to the Spark2 service. You can use the Ambari “Quick Link” to get to the Spark2 History Service UI like so:
The History Server will show the version and build of Spark. In this case, version 2.1.
SSH into one of the cluster Worker nodes:.
And run the Spark PI example:
You can see the completed job in the Spark History Server UI.
To run more Spark examples in HDC visit A Lap around Spark blog and try the examples from there in HDC.
In addition to running Spark jobs from command line as shown above, if you also want to run Spark jobs from Zeppelin UI, follow the below steps to Add & configure Zeppelin into this Spark cluster.
Note: In a future HDCloud Technical Preview, we’ll look to add Zeppelin by default to the Spark 2.1 Cluster Type (so you won’t need the steps below). But for now, you need to manually install & configure Zeppelin into the Spark 2.1 cluster. So here goes…
Use Ambari UI to Add Service
Select Zeppelin Service to add
Accept defaults and go through Ambari add service wizard.
Once Zeppelin is added to the cluster, go to Zeppelin service in Ambari and use Zeppelin UI under the quick link to launch Zeppelin UI
By default Zeppelin comes configured with both Spark1 & Spark2 interpreters.
However before using Spark2 interpreter use Ambari to navigate to Zeppelin config and comment out or remove SPARK_HOME under Advance Zeppelin Env section. After commenting out SPARK_HOME, restart Zeppelin using Ambari.
Once Zeppelin is restarted, visit the Zeppelin UI to create a new note with Spark 2.1, or edit the existing note to run with Spark 2.1.
Adding Security to Zeppelin UI
Note when you add Zeppelin UI, it does not have authentication enabled. We strongly recommend adding security to Zeppelin UI. To add Authentication to Zeppelin UI and protect it from unauthenticated access, see adding authentication to Zeppelin section of Zeppelin guide.
SparkR does not yet work with this technical preview.
It is great to see such rapid progress in the Spark Community and we are excited to get your feedback on the latest Spark 2.1 release.
If you have issues or need help with launching Spark 2.1 or trying out HDCloud, please visit https://community.hortonworks.com/spaces/61/operations-track_2.html?type=question. We’d love to hear from you.