The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

MapReduce Forum

Terasort fails on HDP2.0

  • #42587
    Milind Shah
    Member

    I have a 10 node cluster, with each node having:
    48G RAM
    12 7200 rpm HDD
    24 cores
    10 GbE

    I have setup my cluster using Ambari as followed:
    Node 1:
    NameNode
    Resource Manager
    Zookeeper Server
    MR client
    History Server
    HDFS client
    Ganglia server
    Ganglia monitor
    YARN client
    Zookeeper client

    Node 2:
    Secondary Namenode
    HDFS client
    NodeManager
    YARN Client
    Zookeeper Client
    Ganglie monitor

    Node 3-10:
    Datanode
    Ganglia Monitor
    HDFS Client
    MR2 client
    NodeManager
    YARN Client
    Zookeeper Client.

    My
    yarn.nodemanager.resource.memory-mb = 24576 (24G)
    yarn.scheduler.minimum-allocation-mb = 700
    yarn.scheduler.maximum-allocation-mb = 2048

    and

    mapreduce.map.memory.mb = 800
    mapreduce.reduce.memory.mb = 1500
    mapreduce.map.java.opts = -Xmx756M
    mapreduce.reduce.java.opts = -Xmx1450M

    I am using dfs.replication = 3. I have speculative execution false for both map and reduce.

    On this cluster, I am able to finish 500G sort. To generate 500G data using teragen, it takes about 7m and terasort finishes in about 25m. When I try to run terasort, I see following errors:

    2013-10-31 00:17:19,330 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
    2013-10-31 00:21:51,350 WARN [ResponseProcessor for block BP-1724634037-10.10.99.61-1383196888348:blk_1073756099_15333] org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-1724634037-10.10.99.61-1383196888348:blk_1073756099_15333
    java.io.EOFException: Premature EOF: no length prefix available
    at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1492)
    at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:116)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:721)
    2013-10-31 00:21:51,363 ERROR [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException: All datanodes 10.10.99.69:50010 are bad. Aborting…
    2013-10-31 00:21:51,363 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: All datanodes 10.10.99.69:50010 are bad. Aborting…
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1008)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:823)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:475)

    I also want to try the terasort with dfs.replication=1. When I tried 500G sort, I saw data block corruption every single run with replication=1. Has anyone seen this problem before?

  • Author
    Replies
  • #42729
    Milind Shah
    Member

    In addition this, I also see exceptions as:

    13/11/01 16:35:02 INFO mapreduce.Job: Task Id : attempt_1383346449765_0003_r_000113_0, Status : FAILED
    Container launch failed for container_1383346449765_0003_01_002119 : java.net.SocketTimeoutException: Call From scale-67/10.10.99.67 to scale-65:45454 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.10.99.67:47238 remote=scale-65/10.10.99.65:45454]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:749)
    at org.apache.hadoop.ipc.Client.call(Client.java:1351)
    at org.apache.hadoop.ipc.Client.call(Client.java:1300)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
    at $Proxy30.startContainers(Unknown Source)
    at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
    at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:151)
    at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)
    Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.10.99.67:47238 remote=scale-65/10.10.99.65:45454]
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
    at java.io.FilterInputStream.read(FilterInputStream.java:116)
    at java.io.FilterInputStream.read(FilterInputStream.java:116)
    at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:457)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
    at java.io.DataInputStream.readInt(DataInputStream.java:370)
    at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:995)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:891)

The forum ‘MapReduce’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.