Home Forums YARN Unable to run distributed shell on Yarn

Tagged: 

This topic contains 11 replies, has 5 voices, and was last updated by  Shane Jarvie 11 months ago.

  • Creator
    Topic
  • #36571


    Member

    I am trying to run distributed shell example on yarn cluster


    @Test
    public void realClusterTest() throws Exception {
    System.setProperty("HADOOP_USER_NAME", "hdfs");
    String[] args = {
    "--jar",
    APPMASTER_JAR,
    "--num_containers",
    "1",
    "--shell_command",
    "ls",
    "--master_memory",
    "512",
    "--container_memory",
    "128"
    };
    LOG.info("Initializing DS Client");
    Client client = new Client(new Configuration());
    boolean initSuccess = client.init(args);
    Assert.assertTrue(initSuccess);
    LOG.info("Running DS Client");
    boolean result = client.run();
    LOG.info("Client run completed. Result=" + result);
    Assert.assertTrue(result);
    }

    but it fails with:


    2013-09-17 11:45:28,338 INFO [main] distributedshell.Client (Client.java:monitorApplication(600)) - Got application report from ASM for, appId=11, clientToAMToken=null, appDiagnostics=Application application_1379338026167_0011 failed 2 times due to AM Container for appattempt_1379338026167_0011_000002 exited with exitCode: 1 due to: Exception from container-launch:
    org.apache.hadoop.util.Shell$ExitCodeException:
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:458)
    at org.apache.hadoop.util.Shell.run(Shell.java:373)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:578)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
    ................

    .Failing this attempt.. Failing the application., appMasterHost=N/A, appQueue=default, appMasterRpcPort=0, appStartTime=1379407525237, yarnAppState=FAILED, distributedFinalState=FAILED, appTrackingUrl=ip-10-232-149-222.us-west-2.compute.internal:8088/proxy/application_1379338026167_0011/, appUser=hdfs

    Here is what I see in server logs:


    2013-09-17 08:45:26,870 WARN nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(213)) - Exception from container-launch with container ID: container_1379338026167_0011_02_000001 and exit code: 1
    org.apache.hadoop.util.Shell$ExitCodeException:
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:458)
    at org.apache.hadoop.util.Shell.run(Shell.java:373)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:578)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:258)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:74)

    The question is how can I get more details to identify what is going wrong.

    PS: we are using HDP 2.0.5

Viewing 11 replies - 1 through 11 (of 11 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #46165

    Shane Jarvie
    Member

    Hello,

    I ran into a similar issue when running Java code that accessed HBase. The issue for myself was with the environment, as stated below.

    My solution ended being to provide a location to the HBaseConfiguration files, in a manner such as:

    Configuration conf = new Configuration();
    conf.addResource(new Path(“/etc/hbase/conf/hbase-site.xml”));
    HTable table = new HTable(conf, HBASE_TABLE_MAIN);

    You may need to do something similar with the files in hadoop conf directory.

    Hope that helps

    Collapse
    #43430

    Wang Wei
    Participant

    You need to set HADOOP_MAPRED_HOME in your env var. If that is not ok,you need to set directly yarn.application.classpath o rmapreduce.application.classpath

    If that is not ok yet,you need to check the am *container*.sh.

    Collapse
    #42613

    Hi, any updates on this? I am having a similar issue with a job that ran fine under Hadoop 1.2.0.

    Collapse
    #36864
    Collapse
    #36859


    Member

    I got to that directory. I t contains 2 files which are probably archived. But I was unable to open them. Below is the content from one of them:

    \D1\D3h\91\B5׶9\DFA@\92\BA\E1Px\9C\AD\92\DBj1\86Zb
    m\E8 \88\8A}\A3=\B5v\E3\BBe\EBB`\B31\EE\E9\D2(\D2ث”K\8BN\FC\94y\A5h\BD5v\93ЋR]Hh\F4\FF\F3I\A39A\AF~Lf_/\AF\ABc\84\D0\D1)z\93O\A7\E5e\91 \A1y^\94\E8:= ]\FF\AC&\B3\97\E8\B8\E6 \DBG\EF\99V\8E
    f\9Ed\A3\8B,\FB\A7\C3d8\9A\C7I\FA1L\F3\B8\C9\FDk\F46o\9A+jbcЋQ6\9C\DC1h\9C\D0
    \85]m\80r|\B6
    \CF\F0/\BA\A6DR\B5$\95.$\B5\F63,\BEh\AF\F8\C4m\C6X\9BeD\CAj\88jʵn\A2
    5*\84)msڈ 댸\F1\B8\ADA\CA(ߟv\97\E9\D4[\E0\F8f3> ny\95vow\C7-\93tL\D21I\CB$\87L\F2\98I\9E2O\A8\EB`
    \F9>+\B7\B8R\D3P\96\F3\84\AF\FAI+\A7q:\D8[-0o\84ې\9C1\B0\B6\FF`\B4\94A\CC\F5Ԉ\B5\90\B0ޯw
    \F8
    \\AD\F9\E0od\B2\8Ao\F7\CFғ\8B\F8\C0\BE\AF\D2o\8D Kg~\E2\CC\E2a\E7\B4^\91\95\B0\8C\94ԫPCsJ\F3|\92\9Db\97!\F9Gv\FAa4\E8\DAK\8E\95v\B8}bh3\C0m\8Ba֪\FFק\8C\A7F/
    ]\E1[!%\86;\E1H\EFQ\DBk\EF\D0Q\FC0}0\9C0000x\9C\E3\E2d`s

    \F6\F4\F7\D3\D2`PK\CE\CF+I\CC\CCK-\8A746\B746\B6002343\8F7042\F1 `\C80j+ \EEx\9CcJ\AFb8\BB\FD| 0j\BCdata:BCFile.indexgz\CE data:TFile.indexgz\CD\CD:6data:TFile.metanone\CD\C7000000000\D1\D3h\91\B5׶9\DFA@\92\BA\E1P

    Collapse
    #36850


    Member

    Vinod, at least MR tasks initiated by hive run fine

    Collapse
    #36796

    runeetv
    Member

    Unfortunately I am not able to get much from the logs.

    Few more questions:
    – NM did say that logs were aggregated. Can you check the per-node file at /app-logs/hdfs/logs/application_1379338026167_0125/ on HDFS?
    – If you can’t find anything on HDFS, can you also check your local log dir /hadoop/yarn for the specific container?
    – Are basic MR jobs working? For e.g. , you can run the standard MR examples. If those are also failing, it could point to a set up issue.

    We are working on better debugging for these AM crash failures, but I’d like to help you in any ways possible.

    Collapse
    #36770


    Member

    Vinod,

    Thanks a lot for you reply.

    Since some limitations of this forum for file uploads and message size I am putting all requested info here: http://fusionworks.md/details-for-unable-to-run-distributed-shell-on-yarn/

    Please let me know if more info needed

    Collapse
    #36680

    runeetv
    Member

    Are you able to find anything in the AM container’s logs? You can get to those logs in the ResourceManager’s per application page.

    Please also share what the NodeManager’s logs are showing before the above exception.

    Collapse
    #36607


    Member

    Just in case, I see that hive runs it tasks without problems.

    Collapse
    #36606


    Member

    Looks like this is a general problem. We tryed running PIG against our cluster and it fails with exactly same exception.

    Here is the record regarding this in jobhistory UI. Server logs are nearly the same.
    application_1379338026167_0045 hdfs PigLatin:DefaultJobName MAPREDUCE default Tue, 17 Sep 2013 11:01:55 GMT Tue, 17 Sep 2013 11:02:11 GMT FAILED FAILED

    Collapse
Viewing 11 replies - 1 through 11 (of 11 total)