Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
May 04, 2017
prev slideNext slide

Recent Improvements in Apache Zeppelin & Livy Integration

Hortonworks introduced support for Livy interpreter in Zeppelin with HDP 2.5. At that time, we integrated Livy 0.2 into  Zeppelin which supports 4 types of Livy interpreters

  • %livy   
  • %livy.sql
  • %livy.pyspark
  • %livy.sparkr

We have received extensive feedback from our customers and the community about what they liked about Livy and the issues they experienced. In HDP 2.6, we have upgraded Livy to version 0.3 and have made a number of improvements in Livy and Zeppelin to deliver better user experiences for our customers. Specifically, these enhancements would prevent users from making errors by pre-selecting the default mode, improve productivity by restarting and recreating livy sessions and supporting functionality provided by Spark 2.0.

In this blog, I will highlight several changes and improvements we have made.

Make YARN-cluster as the Default Mode

In HDP 2.5, we allowed users to choose the mode for Livy interpreter, although we encouraged them to use yarn-cluster mode. However, we got feedback from users that they experienced issues due to the incorrect mode specified. So in HDP 2.6, we have enforced the use of yarn-cluster mode which is also the default mode that Livy uses. The property `Livy.spark.master` has been removed from the Livy interpreter setting.

Change Livy Interpreter to Scoped Mode

In HDP 2.5, we used Zeppelin’s shared mode for Livy interpreter. Shared mode requires that all the notes and users share the same Livy interpreter instance of Zeppelin. This makes it difficult to add new features to Livy interpreter.  So in HDP 2.6, we changed Livy interpreter to scope based on user mode where each user’s Livy interpreter are in a different instance but in the same JVM. There are two benefits for scoped mode.

  • Restarting Livy interpreter won’t affect other user’s interpreter.
  • The user can change his own setting to some extent. e.g. If you want to read Avro data, you can change Livy interpreter’s properties, set livy.spark.jars.packages as com.databricks:spark-avro_2.11:3.1.0 and then restart Livy interpreter so that in your next Livy session, you can use Avro library and without affecting other users’ existing Livy session at the same time. Across all users, newly created Livy sessions will also load Avro because everyone still shares the same interpreter setting. There’s one exception that if someone else changes the Livy interpreter before you start the new Livy session, then this change would also affect you and if that person happens to revert your changes then your changes won’t apply. That’s why I mention that a user can change his own setting `to some extent`. We will improve Zeppelin to allow everyone own his or her setting in the next release.

Support for Spark 2.x

In HDP 2.5, Livy interpreter only supported Spark 1.6. However, in HDP 2.6, we support both Spark 1 and Spark 2. We can also have two Livy Servers deployed in HDP 2.6 – one for Spark 1.6 and another for Spark 2.x. The Livy servers are components of Spark 1 and Spark 2 respectively.

Display YARN Application URL in Zeppelin

Although Zeppelin front end displays the most relevant exception messages, often there is need to check the YARN application log to get the details. In this scenario, unless you know the YARN app Id of your Livy session, it is hard to find the YARN app in YARN Resource Manager (RM) UI. In HDP 2.6, we display the YARN app id in Zeppelin and also show its web UI link when the property zeppelin.livy.displayAppInfo is set to true.

Recreate Livy Session Automatically

By default, Livy session expires after one hour. In HDP 2.5 after Livy session expires the user needs to restart the Livy interpreter. In HDP 2.6, we provided the ability to recreate the session automatically for the user and display a warning message in front-end as shown in the screenshot below.

Support for Statement Cancellation

Livy 0.3 supports cancellation of a statement. We have integrated this functionality into Zeppelin as well. Users can now click the cancel button to in order to terminate the running statement.

Support for Python 3

Livy 0.3 supports Python 3 that has been enabled in Zeppelin via %Livy.pyspark3. An additional configuration that user needs to specify is Livy.spark.yarn.appMasterEnv.PYSPARK3_PYTHON in Livy interpreter setting of Zeppelin

Example

Livy.spark.yarn.appMasterEnv.PYSPARK3_PYTHON /Users/jzhang/anaconda/bin/python

Known Issues

In HDP 2.6, we took a major step forward to improve the user experience of Livy interpreter in Zeppelin. But there are some known issues that we need to address in the next release. Please see below the list of candidates that we will be working to resolve in the next release. We encourage you to ask questions and create tickets in Zeppelin community at https://zeppelin.apache.org/

  • matplotlib doesn’t work in Livy pyspark interpreter
  • Job progress is not known in frontend.
  • ZeppelinContext is not available in Livy interpreter

Comments

  • Matplotlib *does* work with livy 0.3. Code as below works, assuming a conda environment has been set up with matplotlib correctly installed and is available across all cluster slave nodes:

    %livy.pyspark
    import matplotlib.pyplot as plt; plt.rcdefaults()
    import numpy as np
    import matplotlib.pyplot as plt
    import StringIO

    matplotlib.use(‘Agg’)

    def show(p):
    img = StringIO.StringIO()
    p.savefig(img, format=’svg’)
    img.seek(0)
    print “%html ” + img.buf + “”

    # Example data
    people = (‘Tom’, ‘Dick’, ‘Harry’, ‘Slim’, ‘Jim’)
    y_pos = np.arange(len(people))
    performance = 3 + 10 * np.random.rand(len(people))
    error = np.random.rand(len(people))

    plt.barh(y_pos, performance, xerr=error, align=’center’, alpha=0.4)
    plt.yticks(y_pos, people)
    plt.xlabel(‘Performance’)
    plt.title(‘How fast do you want to go today?’)

    show(plt)

    Note however that livy 0.3 has an ugly bug whereby all data is sent back encoded using the ISO-8859-1 charset.

  • @Rob Thanks for the comments, the post mean the matplot inline display as it is in jupyter doesn’t work in zeppelin.

  • is there a tutorial on how to use spark2 with livy? when i enter your example:

    %livy
    sc.version

    The result is: 1.6.3

  • Leave a Reply

    Your email address will not be published. Required fields are marked *