The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

MapReduce Forum

Java Heap Space Error

  • #44883
    gehhrald
    Member

    Hi,

    MapReduce jobs work for me when I use a small set of data. Currently I am trying to run about 30k files in a MapReduce job that performs a md5 hash of the files. But I’m facing with java heap space error. I have read solutions online and changed my hadoop_heapsize to 4096MB using the ambari hdfs config but I’m still facing this error. My mapreduce code also has the line “conf.set(“mapred.map.child.java.opts”, “-Xmx2048m”)”. Anyone know of a solution for this?


    13/12/05 12:14:38 INFO input.FileInputFormat: Total input paths to process : 31373
    13/12/05 12:15:39 INFO mapred.JobClient: Running job: job_201312051205_0001
    13/12/05 12:15:40 INFO mapred.JobClient: map 0% reduce 0%
    13/12/05 12:15:57 INFO mapred.JobClient: Task Id : attempt_201312051205_0001_m_000003_0, Status : FAILED
    Error: Java heap space
    13/12/05 12:15:58 INFO mapred.JobClient: Task Id : attempt_201312051205_0001_m_000001_0, Status : FAILED
    Error: Java heap space
    13/12/05 12:15:58 INFO mapred.JobClient: Task Id : attempt_201312051205_0001_m_000002_0, Status : FAILED
    Error: Java heap space
    13/12/05 12:15:59 INFO mapred.JobClient: Task Id : attempt_201312051205_0001_m_000000_0, Status : FAILED
    Error: Java heap space
    13/12/05 12:16:13 INFO mapred.JobClient: Task Id : attempt_201312051205_0001_m_000000_1, Status : FAILED
    Error: Java heap space
    13/12/05 12:16:21 INFO mapred.JobClient: Task Id : attempt_201312051205_0001_m_000001_2, Status : FAILED
    Error: Java heap space
    13/12/05 12:16:25 INFO mapred.JobClient: Job complete: job_201312051205_0001
    13/12/05 12:16:25 INFO mapred.JobClient: Counters: 20
    13/12/05 12:16:25 INFO mapred.JobClient: Job Counters
    13/12/05 12:16:25 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=205473
    13/12/05 12:16:25 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
    13/12/05 12:16:25 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
    13/12/05 12:16:25 INFO mapred.JobClient: Launched map tasks=28
    13/12/05 12:16:25 INFO mapred.JobClient: Data-local map tasks=28
    13/12/05 12:16:25 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
    13/12/05 12:16:25 INFO mapred.JobClient: Failed map tasks=1
    13/12/05 12:16:25 INFO mapred.JobClient: File Output Format Counters
    13/12/05 12:16:25 INFO mapred.JobClient: Bytes Written=500
    13/12/05 12:16:25 INFO mapred.JobClient: FileSystemCounters
    13/12/05 12:16:25 INFO mapred.JobClient: HDFS_BYTES_READ=332069066
    13/12/05 12:16:25 INFO mapred.JobClient: FILE_BYTES_WRITTEN=635223
    13/12/05 12:16:25 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=500
    13/12/05 12:16:25 INFO mapred.JobClient: File Input Format Counters
    13/12/05 12:16:25 INFO mapred.JobClient: Bytes Read=332067653
    13/12/05 12:16:25 INFO mapred.JobClient: Map-Reduce Framework
    13/12/05 12:16:25 INFO mapred.JobClient: Map input records=11
    13/12/05 12:16:25 INFO mapred.JobClient: SPLIT_RAW_BYTES=1413

  • Author
    Replies
  • #45291
    Koelli Mungee
    Moderator

    Hi Gehrald

    Can you check the job configuration file to see what is actually getting passed for the child mapper/reducer processes?

    Regards
    Koelli

The forum ‘MapReduce’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.