HDFS Forum

Spill failed / HDP 1.2.2

  • #19254
    petri koski
    Member

    Hello! I got this one:

    2013-03-28 09:14:57,486 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
    2013-03-28 09:14:57,486 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName mapred for UID 498 from the native implementation
    2013-03-28 09:14:57,488 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root cause:java.io.IOException: Spill failed
    2013-03-28 09:14:57,490 WARN org.apache.hadoop.mapred.Child: Error running child
    java.io.IOException: Spill failed
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1292)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
    Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/spill606.out
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
    at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:121)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1392)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:853)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1344)
    2013-03-28 09:14:57,499 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task

    I checked from Ambari that Node where that task was running got enough space on HardDisk,, Hadoop is installed using Ambari, and cluster is working fine, until I got this “Spill Failed”. My task was loading over 3 million webpages and have ran for 6-7 hours and then “Spill Failed”. Before that I got “Reduce input limit” -error, which I changed to “-1″, and now this Spill Failed .. Any help would be nice!

    Happy Eastern! (Seems that I am spending my eastern with bunnies and Hadoop .. :/ )

to create new topics or reply. | New User Registration

  • Author
    Replies
  • #19258
    petri koski
    Member

    Ok, I got a clue:

    My Mapred Local Dir in one node which fails are mounted to HD where is only 88GB free. Other nodes have over 300GBs. How to add one more Mapred.local dir to MapReduce configs ? I tried to add “custom config” via Ambari, but no luck .. It complains my given local dirs are not good. Can I edit mapred-site.xml directly or is there some special way to add new mapred.local.dir to configs ?

    #19291
    Larry Liu
    Moderator

    Hi, Petri

    Can you please let me know what entry you tried to put into custom config via ambari?

    To answer your question, Ambari doesn’t take any changes made directly in mapred-site.xml.

    Larry

    #19292
    Larry Liu
    Moderator

    Hi, Petri

    You could also try to rebalance hdfs before you make any changes to the configuration.

    The command is
    hadoop balancer

    Run it as hdfs user.

    Larry

The topic ‘Spill failed / HDP 1.2.2’ is closed to new replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.