Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
September 22, 2016
prev slideNext slide

Try the Latest Innovations in Apache Spark and Apache Zeppelin with Hortonworks 2.5 Sandbox

With the release of Hortonworks 2.5 Sandbox several new exciting features have been added to Apache Spark and Apache Zeppelin.

Apache Spark Updates

One of the most powerful new Hortonworks 2.5 Sandbox features is the ability to run two versions of Spark alongside in the same environment: a Generally Available (GA) Spark 1.6.2 and a Tech Preview (TP) of Spark 2.0. If you would like to learn how to effortlessly run different versions of Spark, checkout A Lap Around Apache Spark tutorial.

NOTE:  Zeppelin does not yet support Spark 2.0. This functionality will be coming soon.

Also, a new HBase connector has been added that allows you to ingest HBase datasets straight into a Spark DataFrame. To learn more, see the Spark on HBase tutorial.

Apache Zeppelin Updates

With HDP 2.5, Zeppelin notebook security and multi-user support were added. By enabling a Livy REST server and a LDAP/AD for user authentication, you may now specify user access to different notebooks, depending on their role and needs. Livy also adds a more efficient cluster utilization with the ability to recycle inactive interpreters after 60 minutes.

Given Zeppelin’s General Availability, Enterprise readiness, flexibility (30+ interpreters), ease of use, and a rich development community, it’s a great time to start exploring how you can leverage Zeppelin notebooks to accelerate data wrangling, analytics, and data science in your business. If you would like to give Zeppelin a try, checkout the Learning Spark with Zeppelin tutorial.

Screenshot of a Monte Carlo Simulation in Spark & ZeppelinA Monte Carlo Simulation with Spark & Zeppelin

If you are beyond the basics with Zeppelin and Spark and want to explore other notebooks for inspiration check out the Zeppelin Notebook Gallery or the ZeppelinHub.

What’s New in Spark 2.0

With Spark 2.0 TP now available, there are several updates that you should be aware of.

  • API Unification
    • DataFrame is now an alias for a Dataset of Row type or Dataset[Row] in Scala.
    • SparkSession replaces SparkContext, SQLContext, and HiveContext. In other words, spark is the new entry point to all Spark features.
  • Structured Streaming
    • You can manipulate stream data via DataFrames and Datasets.
    • Real-time incremental processing. Conceptually, it’s useful to think of an infinite DataFrame.
  • Performance Improvements
    • Speedup from Tungsten Phase 2 multi-stage code generation.
    • ORC and Parquet file format improvements.

Stay tuned for more blogs, with more details, on each of these topics.

Get Started in 4 Steps

  1. Download HDP Sandbox as a VM image (VMware and Virtualbox or Docker).
  2. Setup and Start the VM image.
  3. Try a Sandbox tutorial, check out the list of free tutorials, or jump directly into a Learning Spark with Zeppelin hands-on tutorial.
  4. Need more help? Visit the Hortonworks Community Connection(HCC) and interact directly with the community and our development team.

Next Steps

If you want a little more of a guided introduction view the following Hadoop Summit Crash Courses:

You can also find the latest set of Spark tutorials and Zeppelin tutorials.

Try Hortonworks Cloud

Don’t have the minimum 8 GB of RAM to allocate to the virtual machine?  Looking to try the latest in Hive and Spark in AWS. Try the Hortonworks Cloud Technical Preview, which supports ephemeral workloads for Hive and Spark.

MORE RESOURCES

Tags:

Comments

  • Leave a Reply

    Your email address will not be published. Required fields are marked *

    If you have specific technical questions, please post them in the Forums

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>