YARN Forum

Unable to run distributed shell on Yarn

  • #36571
    Member

    I am trying to run distributed shell example on yarn cluster


    @Test
    public void realClusterTest() throws Exception {
    System.setProperty("HADOOP_USER_NAME", "hdfs");
    String[] args = {
    "--jar",
    APPMASTER_JAR,
    "--num_containers",
    "1",
    "--shell_command",
    "ls",
    "--master_memory",
    "512",
    "--container_memory",
    "128"
    };
    LOG.info("Initializing DS Client");
    Client client = new Client(new Configuration());
    boolean initSuccess = client.init(args);
    Assert.assertTrue(initSuccess);
    LOG.info("Running DS Client");
    boolean result = client.run();
    LOG.info("Client run completed. Result=" + result);
    Assert.assertTrue(result);
    }

    but it fails with:


    2013-09-17 11:45:28,338 INFO [main] distributedshell.Client (Client.java:monitorApplication(600)) - Got application report from ASM for, appId=11, clientToAMToken=null, appDiagnostics=Application application_1379338026167_0011 failed 2 times due to AM Container for appattempt_1379338026167_0011_000002 exited with exitCode: 1 due to: Exception from container-launch:
    org.apache.hadoop.util.Shell$ExitCodeException:
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:458)
    at org.apache.hadoop.util.Shell.run(Shell.java:373)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:578)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
    ................

    .Failing this attempt.. Failing the application., appMasterHost=N/A, appQueue=default, appMasterRpcPort=0, appStartTime=1379407525237, yarnAppState=FAILED, distributedFinalState=FAILED, appTrackingUrl=ip-10-232-149-222.us-west-2.compute.internal:8088/proxy/application_1379338026167_0011/, appUser=hdfs

    Here is what I see in server logs:


    2013-09-17 08:45:26,870 WARN nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(213)) - Exception from container-launch with container ID: container_1379338026167_0011_02_000001 and exit code: 1
    org.apache.hadoop.util.Shell$ExitCodeException:
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:458)
    at org.apache.hadoop.util.Shell.run(Shell.java:373)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:578)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:258)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:74)

    The question is how can I get more details to identify what is going wrong.

    PS: we are using HDP 2.0.5

to create new topics or reply. | New User Registration

  • Author
    Replies
  • #36606
    Member

    Looks like this is a general problem. We tryed running PIG against our cluster and it fails with exactly same exception.

    Here is the record regarding this in jobhistory UI. Server logs are nearly the same.
    application_1379338026167_0045 hdfs PigLatin:DefaultJobName MAPREDUCE default Tue, 17 Sep 2013 11:01:55 GMT Tue, 17 Sep 2013 11:02:11 GMT FAILED FAILED

    #36607
    Member

    Just in case, I see that hive runs it tasks without problems.

    #36680
    runeetv
    Member

    Are you able to find anything in the AM container’s logs? You can get to those logs in the ResourceManager’s per application page.

    Please also share what the NodeManager’s logs are showing before the above exception.

    #36770
    Member

    Vinod,

    Thanks a lot for you reply.

    Since some limitations of this forum for file uploads and message size I am putting all requested info here: http://fusionworks.md/details-for-unable-to-run-distributed-shell-on-yarn/

    Please let me know if more info needed

    #36796
    runeetv
    Member

    Unfortunately I am not able to get much from the logs.

    Few more questions:
    – NM did say that logs were aggregated. Can you check the per-node file at /app-logs/hdfs/logs/application_1379338026167_0125/ on HDFS?
    – If you can’t find anything on HDFS, can you also check your local log dir /hadoop/yarn for the specific container?
    – Are basic MR jobs working? For e.g. , you can run the standard MR examples. If those are also failing, it could point to a set up issue.

    We are working on better debugging for these AM crash failures, but I’d like to help you in any ways possible.

    #36850
    Member

    Vinod, at least MR tasks initiated by hive run fine

    #36859
    Member

    I got to that directory. I t contains 2 files which are probably archived. But I was unable to open them. Below is the content from one of them:

    \D1\D3h\91\B5׶9\DFA@\92\BA\E1Px\9C\AD\92\DBj1\86Zb
    m\E8 \88\8A}\A3=\B5v\E3\BBe\EBB`\B31\EE\E9\D2(\D2ث”K\8BN\FC\94y\A5h\BD5v\93ЋR]Hh\F4\FF\F3I\A39A\AF~Lf_/\AF\ABc\84\D0\D1)z\93O\A7\E5e\91 \A1y^\94\E8:= ]\FF\AC&\B3\97\E8\B8\E6 \DBG\EF\99V\8E
    f\9Ed\A3\8B,\FB\A7\C3d8\9A\C7I\FA1L\F3\B8\C9\FDk\F46o\9A+jbcЋQ6\9C\DC1h\9C\D0
    \85]m\80r|\B6
    \CF\F0/\BA\A6DR\B5$\95.$\B5\F63,\BEh\AF\F8\C4m\C6X\9BeD\CAj\88jʵn\A2
    5*\84)msڈ 댸\F1\B8\ADA\CA(ߟv\97\E9\D4[\E0\F8f3> ny\95vow\C7-\93tL\D21I\CB$\87L\F2\98I\9E2O\A8\EB`
    \F9>+\B7\B8R\D3P\96\F3\84\AF\FAI+\A7q:\D8[-0o\84ې\9C1\B0\B6\FF`\B4\94A\CC\F5Ԉ\B5\90\B0ޯw
    \F8
    \\AD\F9\E0od\B2\8Ao\F7\CFғ\8B\F8\C0\BE\AF\D2o\8D Kg~\E2\CC\E2a\E7\B4^\91\95\B0\8C\94ԫPCsJ\F3|\92\9Db\97!\F9Gv\FAa4\E8\DAK\8E\95v\B8}bh3\C0m\8Ba֪\FFק\8C\A7F/
    ]\E1[!%\86;\E1H\EFQ\DBk\EF\D0Q\FC0}0\9C0000x\9C\E3\E2d`s

    \F6\F4\F7\D3\D2`PK\CE\CF+I\CC\CCK-\8A746\B746\B6002343\8F7042\F1 `\C80j+ \EEx\9CcJ\AFb8\BB\FD| 0j\BCdata:BCFile.indexgz\CE data:TFile.indexgz\CD\CD:6data:TFile.metanone\CD\C7000000000\D1\D3h\91\B5׶9\DFA@\92\BA\E1P

    #36864
    #42613

    Hi, any updates on this? I am having a similar issue with a job that ran fine under Hadoop 1.2.0.

    #43430
    Wang Wei
    Participant

    You need to set HADOOP_MAPRED_HOME in your env var. If that is not ok,you need to set directly yarn.application.classpath o rmapreduce.application.classpath

    If that is not ok yet,you need to check the am *container*.sh.

    #46165
    Shane Jarvie
    Member

    Hello,

    I ran into a similar issue when running Java code that accessed HBase. The issue for myself was with the environment, as stated below.

    My solution ended being to provide a location to the HBaseConfiguration files, in a manner such as:

    Configuration conf = new Configuration();
    conf.addResource(new Path(“/etc/hbase/conf/hbase-site.xml”));
    HTable table = new HTable(conf, HBASE_TABLE_MAIN);

    You may need to do something similar with the files in hadoop conf directory.

    Hope that helps

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.