Home Forums YARN JobHistory server could not load history file from HDFS

This topic contains 3 replies, has 2 voices, and was last updated by  D Blair Elzinga 6 months, 3 weeks ago.

  • Creator
    Topic
  • #49233

    Vojtech Caha
    Participant

    Error message looks like this:
    Could not load history file hdfs://namenodeha:8020/mr-history/tmp/hdfs/job_1392049860497_0005-1392129567754-hdfs-word+count-1392129599308-1-1-SUCCEEDED-default.jhist

    Actually, I know the answer to the problem. The defaul settings of /mr-history files is:

    hadoop fs -chown -R $MAPRED_USER:$HDFS_USER /mr-history

    But when running a job (under $HDFS_USER), job file is saved to /mr-history/tmp/hdfs under $HDFS_USER:$HDFS_USER and then not accessible to $MAPRED_USER (where JobHistory server is running). After changing the permissions back again the job file can be load.

    But it is happening again with every new job. What is the pernament solution to this? thank you.

Viewing 3 replies - 1 through 3 (of 3 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #49861

    D Blair Elzinga
    Participant

    Finally found it – turns out that the oozie-env.sh file from the distribution had some lines commented out, and during installation they were left that way. This included OOZIE_BASE_URL components, so it was empty.

    Collapse
    #49826

    D Blair Elzinga
    Participant

    I can also work around the problem by running the job as ‘mapred’ user instead of hue or some other user. I’m hoping to be able to fix that.

    Beyond the permission issue, Here is apparently the issue as to why nothing sees the job end notification:
    2014-03-07 14:55:11,377 INFO [Thread-62] org.mortbay.log: Job end notification trying http://:/oozie/callback?id=0000019-140305061228920-oozie-oozi-W@EvaluateMessage2&status=SUCCEEDED&

    Notice the web address of “//:” Could this be a configration issue, and if so, what parameter needs to be set?

    Collapse
    #49573

    D Blair Elzinga
    Participant

    I’m having a similar issue. When I try to load the log of a past job I get one of two errors:

    org.apache.hadoop.yarn.webapp.WebAppException: /octopus.svs.usa.hp.com:19888/jobhistory/logs/clownfish.svs.usa.hp.com:45454/container_1394028045311_0004_01_000001/container_1394028045311_0004_01_000001/hdfs: controller for octopus.svs.usa.hp.com:19888 not found
    at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)

    org.apache.hadoop.yarn.webapp.WebAppException: /v1/history/mapreduce/: controller for v1 not found
    at org.apache.hadoop.yarn.webapp.Router.resolveDefault(Router.java:232)

    The jobs complete fine, but anything that tries to get history on them fails. This includes looking at logs or running the jobs in an oozie workflow. Evidently oozie gets its completion status from the history server, and if the history can’t be read, then oozie thinks that the job is still running…

    I thought there must be something in my mapred-site.xml or yarn-site.xml – but you have evidently gotten it to work temporarily by changing the permissions inside /mr-history directory? Could you be more specific? Have you solved this?

    Collapse
Viewing 3 replies - 1 through 3 (of 3 total)