Hive / HCatalog Forum

MapJoinMemoryExhaustionException on local job

  • #44100

    Hi, I am getting the following error when running a query that converts to a local mapjoin in HDP 2.0 installed with the Amabari:

    ERROR mr.MapredLocalTask ( – Hive Runtime Error: Map local work exhausted memory
    org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionException: 2013-11-20 07:30:48 Processing rows: 1700000 Hashtable size: 1699999 Memory usage: 965243784 percentage: 0.906
    at org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionHandler.checkMemoryStatus(


    At the beginning of the query execution Hive shows the following message:
    Starting to launch local task to process map join; maximum memory = 1065484288

    I haven’t found which are the options that I need to set to increase the maximum memory alloted for this process. Could anyone tell me?

to create new topics or reply. | New User Registration

  • Author
  • #44428

    From what I see Hive seems not to take into account the parameter set hive.mapjoin.smalltable.filesize=25000000 when I set it in .hiverc or manually in the hive command line.

    I also tried setting the value of that parameter to a very low number like 5 and hive still tries to convert to a map join a table that has several hundred megabytes.

    Any ideas what might be happening? is this a known issue?

    Yi Zhang

    Hi Juan,

    Can you try these settings:, default 1000000000 default 0.5

    also check the mapper size, node manager mem settings.



    Hi Yi,

    I tried setting the property to a lower value (150000000) and finally found one that finishes the query successfully but with more steps.

    Thanks for your help.

    Prabhu Ramakrishnan


    I am using HDP 2.0 in production with 100 of data nodes. While running a simple join query I am getting bellow error related to insufficient memory. I tried to


    but no impact in the Map Joins still getting same error

    2014-07-28 10:43:43     Starting to launch local task to process map join;      maximum memory = 1065484288
    2014-07-28 10:43:46     Processing rows:        200000  Hashtable size: 199999  Memory usage:   87561240        percentage:     0.082
    2014-07-28 10:43:47     Processing rows:        300000  Hashtable size: 299999  Memory usage:   128557528       percentage:     0.121
    2014-07-28 10:43:48     Processing rows:        400000  Hashtable size: 399999  Memory usage:   173836496       percentage:     0.163
    2014-07-28 10:44:03     Processing rows:        2100000 Hashtable size: 2099999 Memory usage:   886909320       percentage:     0.832
    2014-07-28 10:44:07     Processing rows:        2200000 Hashtable size: 2199999 Memory usage:   915933544       percentage:     0.936
    Execution failed with exit status: 3
    Obtaining error information
    Task failed!
    Task ID:
    FAILED: Execution Error, return code 3 from
    <b>Error from: /tmp/prabhunkl/hive.log </b>
    2014-07-28 10:44:08,289 ERROR mr.MapredLocalTask ( - Hive Runtime Error: Map local work exhausted memory
    org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionException: 2014-07-28 10:44:08	Processing rows:	2400000	Hashtable size:	2399999	Memory usage:	997667888	percentage:	0.936
    	at org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionHandler.checkMemoryStatus(
    	at org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.processOp(
    	at org.apache.hadoop.hive.ql.exec.Operator.process(
    	at org.apache.hadoop.hive.ql.exec.Operator.forward(
    	at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(
    	at org.apache.hadoop.hive.ql.exec.Operator.process(
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(
    	at java.lang.reflect.Method.invoke(
    	at org.apache.hadoop.util.RunJar.main(
    Any recommendations ?
    I appreciate your help.

    Hi Prabhu. I would suggest you use:
    before running your query to disable local inmemory joins and force the join to be done as a distributed Map-Reduce phase. After running your query you should set the value back to true with:
    Note that this just circunvents the actual issue and is not a real fix but it works fine if you just need to run the query with no regards to performance.

    Prabhu Ramakrishnan

    I just overcame from this issue . by reducing the number of rows I am selecting to join with another table. Instead of joining all the possible data within single query, I split the data into small sub set and achieved the same in multiple steps.

    The total execution time is far less than the single consolidated Query.

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.