Home Forums Oozie Problems with Hive and Oozie

Tagged: 

This topic contains 4 replies, has 2 voices, and was last updated by  Michael Peterson 9 months, 1 week ago.

  • Creator
    Topic
  • #40972

    Michael Peterson
    Participant

    After a great deal of effort, tweaking and adjusting numerous things, I am able to run Hive actions in Oozie to do at least some things, but my overall action is still failing on the last step. To jump ahead to the punchline, here is the error message:

    LOAD DATA INPATH ‘/user/root/examples/input-data/yytable2/yytable’ INTO TABLE xmas
    2013-10-18 11:25:37,119 INFO hive.ql.parse.ParseDriver: Parse Completed
    2013-10-18 11:25:37,149 ERROR org.apache.hadoop.hive.ql.Driver: FAILED: SemanticException [Error 10028]: Line 3:17 Path is not legal ”/user/root/examples/input-data/yytable2/yytable”: Move from: hdfs://10.230.138.159:8020/user/root/examples/input-data/yytable2/yytable to: hdfs://michael-hadoop-5.tsh.thomson.com:8020/apps/hive/warehouse/xmas is not valid. Please check that values for params “default.fs.name” and “hive.metastore.warehouse.dir” do not conflict.
    org.apache.hadoop.hive.ql.parse.SemanticException: Line 3:17 Path is not legal ”/user/root/examples/input-data/yytable2/yytable”: Move from: hdfs://10.230.138.159:8020/user/root/examples/input-data/yytable2/yytable to: hdfs://michael-hadoop-5.tsh.thomson.com:8020/apps/hive/warehouse/xmas is not valid. Please check that values for params “default.fs.name” and “hive.metastore.warehouse.dir” do not conflict.
    at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.applyConstraints(LoadSemanticAnalyzer.java:156)
    at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.analyzeInternal(LoadSemanticAnalyzer.java:2

    The background:

    The hive script is:


    DROP TABLE IF EXISTS yytest2;

    CREATE EXTERNAL TABLE yytest2 (x int)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
    LOCATION '/user/root/examples/input-data/yytable2/yytable';

    INSERT OVERWRITE DIRECTORY '/user/root/examples/output-data/hive2yy' SELECT * FROM yytest2;

    CREATE TABLE IF NOT EXISTS xmas (x int)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

    LOAD DATA INPATH '/user/root/examples/input-data/yytable2/yytable' INTO TABLE xmas;

    When I run this Hive script on my Hadoop system in the Hive shell it runs fine and the last Hive command (to populate the “xmas” table) goes into the Hive warehouse at /apps/hive/warehouse/xmas.

    In order to run this on Oozie I had to make numerous adjustments including following the steps here: http://stackoverflow.com/a/16361987/871012
    and including the hive-site.xml in my hive “oozie package” which gets put into HDFS in order to run the oozie workflow.

    When the oozie job finishes it ends in an error state, but the first 4 commands in the hive script worked – both tables were created. But the final command fails (remember it works fine when I run it myself in the hive shell), with the error message at the top of this note.

    Please advise how to get this to run.

Viewing 4 replies - 1 through 4 (of 4 total)

The topic ‘Problems with Hive and Oozie’ is closed to new replies.

  • Author
    Replies
  • #41173

    Michael Peterson
    Participant

    Hi Yi,

    Switching my nameNode and jobTracker properties to the FQDN, rather than the IP address, fixed it:
    nameNode=hdfs://michael-hadoop-5.tsh.thomson.com:8020
    jobTracker=http://michael-hadoop-5.tsh.thomson.com:50300

    It’s a little distressing that Hive/Hadoop is so finicky that it can’t use the IP address or FQDN interchangably (this VM has only one (virtual) NIC). In any case, thanks for your help.

    -Michael

    Collapse
    #41156

    Yi Zhang
    Moderator

    Hi Michael,

    The error message params default.fs.name looks odd, the property should be called fs.default.name in hadoop 1.x in core-site.xml.

    is the ip address 10.230.138.159 for michael-hadoop-5.tsh.thomson.com?

    In oozie-site.xml have you pointed it to the correct hadoop conf directory?

    In job.properties file is the namenode pointing to the correct FQDN of the name node?

    Thanks,
    Yi

    Collapse
    #40974

    Michael Peterson
    Participant

    This forum has no way to preview or delete your own posts. That really is quite frustrating.

    The XML did not post correctly in the previous follow-ups, so here it is again:

    Here’s the workflow.xml that runs the above script:


    &lt?xml version="1.0" encoding="UTF-8"?>
    &ltworkflow-app xmlns="uri:oozie:workflow:0.2" name="hive-wf">
    &ltstart to="hive-node"/>

    &ltaction name="hive-node">
    &lthive xmlns="uri:oozie:hive-action:0.2">
    &ltjob-tracker>${jobTracker}&lt/job-tracker>
    &ltname-node>${nameNode}&lt/name-node>
    &ltprepare>
    &ltdelete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/hive2"/>
    &ltmkdir path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data"/>
    &lt/prepare>
    &ltjob-xml>hive-site.xml&lt/job-xml>
    &ltconfiguration>
    &ltproperty>
    &ltname>mapred.job.queue.name&lt/name>
    &ltvalue>${queueName}&lt/value>
    &lt/property>
    &lt/configuration>
    &ltscript>script.sql&lt/script>
    &lt/hive>
    &ltok to="end"/>
    &lterror to="fail"/>
    &lt/action>

    &ltkill name="fail">
    &ltmessage>Failed, error message[${wf:errorMessage(wf:lastErrorNode())}]&lt/message>
    &lt/kill>
    &ltend name="end"/>
    &lt/workflow-app>

    The “job-xml” reference includes the hive-site.xml from /etc/hive/conf with the metastore password added.

    Collapse
    #40973

    Michael Peterson
    Participant

    Since you have a character limit on posts, this is part two:

    The error says that “default.fs.name” and “hive.metastore.warehouse.dir” conflict. I cannot find “default.fs.name” anywhere in my system. The hive.metastore.warehouse.dir was set to ‘/apps/hive/warehouse’, but I see that there is also a ‘/user/hive/warehouse’ HDFS dir, so I changed the property to be that and it fails with the same error message. Suggestions online say to use metatool to determine what the default.fs.name is, but that fails for me:


    $ /usr/lib/hive/bin/metatool -listFSRoot
    Initializing HiveMetaTool..
    HiveMetaTool:Parsing failed. Reason: Unrecognized option: -hiveconf

    I’m using a CentOS VMWare VM running Hortonworks HDP 1.3
    Oozie version: 3.3.2.1.3.2.0-111
    Hive version: 0.11 (0.11.0.1.3.2.0-111)

    Here’s the workflow.xml that runs the above script:

    ${jobTracker}
    ${nameNode}

    hive-site.xml

    mapred.job.queue.name
    ${queueName}

    script.sql

    Failed, error message[${wf:errorMessage(wf:lastErrorNode())}]

    The “job-xml” reference includes the hive-site.xml from /etc/hive/conf with the metastore password added.

    Collapse
Viewing 4 replies - 1 through 4 (of 4 total)