The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

Hortonworks Sandbox Forum

How to import existing workflow package into hortonworks sandbox?

  • #55876
    Dzianis Sokal
    Participant

    Hi,

    I am not able to import my oozie package into the hortonworks sandbox. Any help is highly appreciated!

    I have the following package in the local

    MyProject.tar.gz
    - MyProject
    -- workflow.xml
    -- lib
    --- myLib.jar
    --- otherLib.jar

    First I tried to import it into hortonworks sandbox via command-line:

    oozie job -verbose -oozie http://sandbox.hortonworks.com:11000/oozie -config /home/hue/job.properties -run

    Where job.properties is

    nameNode=hdfs://sandbox.hortonworks.com:8020
    jobTracker=sandbox.hortonworks.com:8050
    queueName=default
    examplesRoot=examples

    oozie.wf.application.path=${nameNode}/user/${user.name}/MyProject.tar.gz

    and this gives me

    Error E0701: XML schema error, Content is not allowed in prolog.

    I checked there is no BOM in my workflow file and was able to successfully validate it via

    oozie validate workflow.xml

    So I give up command-line and tried import it from web UI. I can import only workflow.xml successully. However I don’t know how to add myLib.jar and otherLib.jar into the classpath. I tried zipping lib folder and adding the resulting package to “Workflow resource archive (zip)” during import, but I get following exception:

    get() returned more than one Node -- it returned 2! Lookup parameters were {'name': 'kill', 'workflow': <Workflow: mc2 - hue>}

    click on more info:

    /usr/lib/hue/apps/oozie/src/oozie/views/editor.py 205 import_workflow
    /usr/lib/hue/apps/oozie/src/oozie/import_workflow.py 591 import_workflow
    /usr/lib/hue/apps/oozie/src/oozie/import_workflow.py 127 _save_links
    /usr/lib/hue/apps/oozie/src/oozie/import_workflow.py 241 _node_relationships
    /usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/db/models/manager.py 132 get
    /usr/lib/hue/build/env/lib/python2.6/site-packages/Django-1.2.3-py2.6.egg/django/db/models/query.py 343 get

    That’s all I have. Any suggestions?

  • Author
    Replies
  • #55919
    iandr413
    Moderator

    Hi Dzianis,
    Can you validate you are able to run the examples provided as part of the sandbox by executing the following:

    su oozie
    cd /usr/share/doc/oozie-4.0.0.2.1.1.0
    oozie job -oozie http://sandbox.hortonworks.com:11000/oozie -config examples/apps/map-reduce/job.properties -run

    More information can be found here -> http://oozie.apache.org/docs/3.1.3-incubating/DG_Examples.html

    Once you get a sample running, you should be able to use that as a baseline for getting your oozie job running.

    Ian

    #56159
    Dzianis Sokal
    Participant

    Thanks, iandr413. I was able to run example. I was trying to launch app from archive before, so I extracted files from archive and pointed my job.properties to the folder and it works! However the process looks too complicated right now:
    1. I first package my app via mvn
    2. Copy workflow.xml and libs into HDFS
    3. Copy jop.properties into sandbox’s filesystem
    4. Launch from sandbox terminal

    Any way I can simplify it? Ideally it will be great to import my package from we UI.

The forum ‘Hortonworks Sandbox’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.