One of the most powerful new Hortonworks 2.5 Sandbox features is the ability to run two versions of Spark alongside in the same environment: a Generally Available (GA) Spark 1.6.2 and a Tech Preview (TP) of Spark 2.0. If you would like to learn how to effortlessly run different versions of Spark, checkout A Lap Around Apache Spark tutorial.
NOTE: Zeppelin does not yet support Spark 2.0. This functionality will be coming soon.
Also, a new HBase connector has been added that allows you to ingest HBase datasets straight into a Spark DataFrame. To learn more, see the Spark on HBase tutorial.
With HDP 2.5, Zeppelin notebook security and multi-user support were added. By enabling a Livy REST server and a LDAP/AD for user authentication, you may now specify user access to different notebooks, depending on their role and needs. Livy also adds a more efficient cluster utilization with the ability to recycle inactive interpreters after 60 minutes.
Given Zeppelin’s General Availability, Enterprise readiness, flexibility (30+ interpreters), ease of use, and a rich development community, it’s a great time to start exploring how you can leverage Zeppelin notebooks to accelerate data wrangling, analytics, and data science in your business. If you would like to give Zeppelin a try, checkout the Learning Spark with Zeppelin tutorial.
A Monte Carlo Simulation with Spark & Zeppelin
With Spark 2.0 TP now available, there are several updates that you should be aware of.
Stay tuned for more blogs, with more details, on each of these topics.
If you want a little more of a guided introduction view the following Hadoop Summit Crash Courses:
Don’t have the minimum 8 GB of RAM to allocate to the virtual machine? Looking to try the latest in Hive and Spark in AWS. Try the Hortonworks Cloud Technical Preview, which supports ephemeral workloads for Hive and Spark.