Sqoop using shell is failing

to create new topics or reply. | New User Registration

Tagged: , ,

This topic contains 9 replies, has 4 voices, and was last updated by  tedr 2 years ago.

  • Creator
  • #28103

    Hi There,

    I have a pig script which includes an “sh” command which executes sqoop. The pig script works fine when executed using “pig -f script.pig” but
    fails when executed as part of an Oozie workflow.

    Versions of each component:

    Hadoop (as part of HDP 1.2)
    Apache Pig version (rexported)
    Oozie client build version:

    We’ve been struglling with this issue and tried a few things (mostly with regards to classpaths of sqoop/pig/oozie) but none of them really helped.

    We also tried to play with log4j properties with no luck.

    The following exceptions are being logged to Oozie pig launcher map reduce job:

    The following exception is thrown by sqoop (sensitive data is obfuscated of course):
    Exception in thread “main” java.lang.NoClassDefFoundError: Could not initialize class org.apache.log4j.LogManager
    at org.apache.log4j.Logger.getLogger(Logger.java:105)
    at org.apache.sqoop.util.LoggingUtils.setDebugLevel(LoggingUtils.java:50)
    at org.apache.sqoop.tool.BaseSqoopTool.applyCommonOptions(BaseSqoopTool.java:739)
    at org.apache.sqoop.tool.ImportTool.applyOptions(ImportTool.java:722)
    at org.apache.sqoop.tool.SqoopTool.parseArguments(SqoopTool.java:433)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:129)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:238)

    Thanks in advance.

Viewing 9 replies - 1 through 9 (of 9 total)

You must be to reply to this topic. | Create Account

  • Author
  • #28874


    HI Moty.

    Thanks for letting us know that you found the solution.




    We found the issue!

    Seems like a bug in Hadoop MapReduce version we use on HDP:

    Since HDP 1.2 doesn’t include this fix, we had to change the hadoop-env.sh as the patch applies.

    Thanks all for your help!



    Hi Moty,

    Try this: copy the log4j jar file from /usr/lib/hadoop/lib to /usr/lib/sqoop/lib and then rerun the workflow



    Hi Sef,


    Yes, Pig and MapReduce work seamlessly when not integrating any Sqoop calls in the pig script. Nonetheless, there are a few more shell commands we use in the pig script which works: “hadoop fs -rmr /path/to/delete”

    Any more ideas?


    Seth Lyubich

    Hi Moty,

    Can you please check if your Pig and Mapreduce example jobs work correctly? Some information can be found here.


    Please let me know if this is helpful.



    Hi Seth,


    Yes, we aet the property you mentioned as part of the oozie workflow.

    Any more ideas?


    Hi Sasha,

    The Classpath settings includes log4j classes. But still, the issue occurs. Even when passing class path information by hand.

    Any other idea?



    Sasha J

    Check you classpath settings, or pass the correct classpath through the command line.
    This error:
    Exception in thread “main” java.lang.NoClassDefFoundError: Could not initialize class org.apache.log4j.LogManager
    means that process can not find log4j classes, most likely because log4j.jar is not in class path.

    Thank you!


    And the following exception is thrown by Pig:

    java.io.IOException: java.io.IOException: sh command ‘sqoop import –connect jdbc:mysql://* –driver com.mysql.jdbc.Driver –username * –password * –as-textfile –fields-terminated-by , –enclosed-by “\”” –compress –table “events” –target-dir /user/tmp/sqoop-test.gz –input-null-string “null” –input-null-non-string “” –verbose –columns “id”‘ failed. Please check output logs for details
    at org.apache.pig.tools.grunt.GruntParser.processShCommand(GruntParser.java:1092)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:175)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
    at org.apache.pig.Main.run(Main.java:475)
    at org.apache.pig.PigRunner.run(PigRunner.java:49)
    at org.apache.oozie.action.hadoop.PigMain.runPigJob(PigMain.java:283)
    at org.apache.oozie.action.hadoop.PigMain.run(PigMain.java:223)
    at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:37)
    at org.apache.oozie.action.hadoop.PigMain.main(PigMain.java:76)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:491)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:365)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
    Caused by: java.io.IOException: sh command ‘sqoop import –connect jdbc:mysql://* –driver com.mysql.jdbc.Driver –username * –password * –as-textfile –fields-terminated-by , –enclosed-by “\”” –compress –table “events” –target-dir /user/tmp/sqoop-test.gz –input-null-string “null” –input-null-non-string “” –verbose –columns “id”‘ failed. Please check output logs for details
    at org.apache.pig.tools.grunt.GruntParser.processShCommand(GruntParser.java:1088)
    … 23 more
    Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.PigMain], exit code [2]

Viewing 9 replies - 1 through 9 (of 9 total)
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.