MapReduce Forum

Terasort fails on HDP2.0

  • #42587
    Milind Shah

    I have a 10 node cluster, with each node having:
    48G RAM
    12 7200 rpm HDD
    24 cores
    10 GbE

    I have setup my cluster using Ambari as followed:
    Node 1:
    Resource Manager
    Zookeeper Server
    MR client
    History Server
    HDFS client
    Ganglia server
    Ganglia monitor
    YARN client
    Zookeeper client

    Node 2:
    Secondary Namenode
    HDFS client
    YARN Client
    Zookeeper Client
    Ganglie monitor

    Node 3-10:
    Ganglia Monitor
    HDFS Client
    MR2 client
    YARN Client
    Zookeeper Client.

    yarn.nodemanager.resource.memory-mb = 24576 (24G)
    yarn.scheduler.minimum-allocation-mb = 700
    yarn.scheduler.maximum-allocation-mb = 2048

    and = 800
    mapreduce.reduce.memory.mb = 1500 = -Xmx756M = -Xmx1450M

    I am using dfs.replication = 3. I have speculative execution false for both map and reduce.

    On this cluster, I am able to finish 500G sort. To generate 500G data using teragen, it takes about 7m and terasort finishes in about 25m. When I try to run terasort, I see following errors:

    2013-10-31 00:17:19,330 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
    2013-10-31 00:21:51,350 WARN [ResponseProcessor for block BP-1724634037-] org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-1724634037- Premature EOF: no length prefix available
    at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(
    at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$
    2013-10-31 00:21:51,363 ERROR [main] PriviledgedActionException as:hdfs (auth:SIMPLE) All datanodes are bad. Aborting…
    2013-10-31 00:21:51,363 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : All datanodes are bad. Aborting…
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(
    at org.apache.hadoop.hdfs.DFSOutputStream$

    I also want to try the terasort with dfs.replication=1. When I tried 500G sort, I saw data block corruption every single run with replication=1. Has anyone seen this problem before?

to create new topics or reply. | New User Registration

  • Author
  • #42729
    Milind Shah

    In addition this, I also see exceptions as:

    13/11/01 16:35:02 INFO mapreduce.Job: Task Id : attempt_1383346449765_0003_r_000113_0, Status : FAILED
    Container launch failed for container_1383346449765_0003_01_002119 : Call From scale-67/ to scale-65:45454 failed on socket timeout exception: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/ remote=scale-65/]; For more details see:
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
    at java.lang.reflect.Constructor.newInstance(
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(
    at $Proxy30.startContainers(Unknown Source)
    at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(
    at java.util.concurrent.ThreadPoolExecutor$
    Caused by: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/ remote=scale-65/]
    at org.apache.hadoop.ipc.Client$Connection$
    at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(
    at org.apache.hadoop.ipc.Client$

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.