Home Forums MapReduce Terasort fails on HDP2.0

This topic contains 1 reply, has 1 voice, and was last updated by  Milind Shah 5 months, 3 weeks ago.

  • Creator
    Topic
  • #42587

    Milind Shah
    Member

    I have a 10 node cluster, with each node having:
    48G RAM
    12 7200 rpm HDD
    24 cores
    10 GbE

    I have setup my cluster using Ambari as followed:
    Node 1:
    NameNode
    Resource Manager
    Zookeeper Server
    MR client
    History Server
    HDFS client
    Ganglia server
    Ganglia monitor
    YARN client
    Zookeeper client

    Node 2:
    Secondary Namenode
    HDFS client
    NodeManager
    YARN Client
    Zookeeper Client
    Ganglie monitor

    Node 3-10:
    Datanode
    Ganglia Monitor
    HDFS Client
    MR2 client
    NodeManager
    YARN Client
    Zookeeper Client.

    My
    yarn.nodemanager.resource.memory-mb = 24576 (24G)
    yarn.scheduler.minimum-allocation-mb = 700
    yarn.scheduler.maximum-allocation-mb = 2048

    and

    mapreduce.map.memory.mb = 800
    mapreduce.reduce.memory.mb = 1500
    mapreduce.map.java.opts = -Xmx756M
    mapreduce.reduce.java.opts = -Xmx1450M

    I am using dfs.replication = 3. I have speculative execution false for both map and reduce.

    On this cluster, I am able to finish 500G sort. To generate 500G data using teragen, it takes about 7m and terasort finishes in about 25m. When I try to run terasort, I see following errors:

    2013-10-31 00:17:19,330 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
    2013-10-31 00:21:51,350 WARN [ResponseProcessor for block BP-1724634037-10.10.99.61-1383196888348:blk_1073756099_15333] org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-1724634037-10.10.99.61-1383196888348:blk_1073756099_15333
    java.io.EOFException: Premature EOF: no length prefix available
    at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1492)
    at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:116)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:721)
    2013-10-31 00:21:51,363 ERROR [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException: All datanodes 10.10.99.69:50010 are bad. Aborting…
    2013-10-31 00:21:51,363 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: All datanodes 10.10.99.69:50010 are bad. Aborting…
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1008)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:823)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:475)

    I also want to try the terasort with dfs.replication=1. When I tried 500G sort, I saw data block corruption every single run with replication=1. Has anyone seen this problem before?

Viewing 1 replies (of 1 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #42729

    Milind Shah
    Member

    In addition this, I also see exceptions as:

    13/11/01 16:35:02 INFO mapreduce.Job: Task Id : attempt_1383346449765_0003_r_000113_0, Status : FAILED
    Container launch failed for container_1383346449765_0003_01_002119 : java.net.SocketTimeoutException: Call From scale-67/10.10.99.67 to scale-65:45454 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.10.99.67:47238 remote=scale-65/10.10.99.65:45454]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:749)
    at org.apache.hadoop.ipc.Client.call(Client.java:1351)
    at org.apache.hadoop.ipc.Client.call(Client.java:1300)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
    at $Proxy30.startContainers(Unknown Source)
    at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
    at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:151)
    at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)
    Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.10.99.67:47238 remote=scale-65/10.10.99.65:45454]
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
    at java.io.FilterInputStream.read(FilterInputStream.java:116)
    at java.io.FilterInputStream.read(FilterInputStream.java:116)
    at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:457)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
    at java.io.DataInputStream.readInt(DataInputStream.java:370)
    at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:995)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:891)

    Collapse
Viewing 1 replies (of 1 total)