Home Forums Hive / HCatalog MapJoinMemoryExhaustionException on local job

This topic contains 6 replies, has 3 voices, and was last updated by  Prabhu Ramakrishnan 3 days, 15 hours ago.

  • Creator
    Topic
  • #44100

    Hi, I am getting the following error when running a query that converts to a local mapjoin in HDP 2.0 installed with the Amabari:

    ERROR mr.MapredLocalTask (MapredLocalTask.java:executeFromChildJVM(323)) – Hive Runtime Error: Map local work exhausted memory
    org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionException: 2013-11-20 07:30:48 Processing rows: 1700000 Hashtable size: 1699999 Memory usage: 965243784 percentage: 0.906
    at org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionHandler.checkMemoryStatus(MapJoinMemoryExhaustionHandler.java:91)

    ….

    At the beginning of the query execution Hive shows the following message:
    Starting to launch local task to process map join; maximum memory = 1065484288

    I haven’t found which are the options that I need to set to increase the maximum memory alloted for this process. Could anyone tell me?

Viewing 6 replies - 1 through 6 (of 6 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #58025

    Prabhu Ramakrishnan
    Participant

    I just overcame from this issue . by reducing the number of rows I am selecting to join with another table. Instead of joining all the possible data within single query, I split the data into small sub set and achieved the same in multiple steps.

    The total execution time is far less than the single consolidated Query.

    Collapse
    #58015

    Hi Prabhu. I would suggest you use:
    set hive.auto.convert.join=false;
    before running your query to disable local inmemory joins and force the join to be done as a distributed Map-Reduce phase. After running your query you should set the value back to true with:
    set hive.auto.convert.join=true;
    Note that this just circunvents the actual issue and is not a real fix but it works fine if you just need to run the query with no regards to performance.

    Collapse
    #57917

    Prabhu Ramakrishnan
    Participant

    Hi,

    I am using HDP 2.0 in production with 100 of data nodes. While running a simple join query I am getting bellow error related to insufficient memory. I tried to

    set hive.auto.convert.join.noconditionaltask.size=150000000;

    but no impact in the Map Joins still getting same error

    2014-07-28 10:43:43     Starting to launch local task to process map join;      maximum memory = 1065484288
    2014-07-28 10:43:46     Processing rows:        200000  Hashtable size: 199999  Memory usage:   87561240        percentage:     0.082
    2014-07-28 10:43:47     Processing rows:        300000  Hashtable size: 299999  Memory usage:   128557528       percentage:     0.121
    2014-07-28 10:43:48     Processing rows:        400000  Hashtable size: 399999  Memory usage:   173836496       percentage:     0.163
    2014-07-28 10:44:03     Processing rows:        2100000 Hashtable size: 2099999 Memory usage:   886909320       percentage:     0.832
    2014-07-28 10:44:07     Processing rows:        2200000 Hashtable size: 2199999 Memory usage:   915933544       percentage:     0.936
    Execution failed with exit status: 3
    Obtaining error information
    
    Task failed!
    Task ID:
      Stage-5
    
    Logs:
    
    /tmp/prabhunkl/hive.log
    FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
    
    <b>Error from: /tmp/prabhunkl/hive.log </b>
    2014-07-28 10:44:08,289 ERROR mr.MapredLocalTask (MapredLocalTask.java:executeFromChildJVM(323)) - Hive Runtime Error: Map local work exhausted memory
    org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionException: 2014-07-28 10:44:08	Processing rows:	2400000	Hashtable size:	2399999	Memory usage:	997667888	percentage:	0.936
    	at org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionHandler.checkMemoryStatus(MapJoinMemoryExhaustionHandler.java:91)
    	at org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.processOp(HashTableSinkOperator.java:249)
    	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
    	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:842)
    	at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
    	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:504)
    	at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:363)
    	at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:314)
    	at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:722)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    	at java.lang.reflect.Method.invoke(Method.java:597)
    	at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
    
    
    Any recommendations ?
    
    I appreciate your help.
    
    Thanks,
    Prabhu.
    
    Collapse
    #44847

    Hi Yi,

    I tried setting the property hive.auto.convert.join.noconditionaltask.size to a lower value (150000000) and finally found one that finishes the query successfully but with more steps.

    Thanks for your help.

    Collapse
    #44561

    Yi Zhang
    Moderator

    Hi Juan,

    Can you try these settings:
    hive.auto.convert.join.noconditionaltask=true/false
    hive.auto.convert.join.noconditionaltask.size, default 1000000000
    hive.map.aggr.hash.percentmemory default 0.5

    also check the mapper size, node manager mem settings.

    Thanks,
    Yi

    Collapse
    #44428

    From what I see Hive seems not to take into account the parameter set hive.mapjoin.smalltable.filesize=25000000 when I set it in .hiverc or manually in the hive command line.

    I also tried setting the value of that parameter to a very low number like 5 and hive still tries to convert to a map join a table that has several hundred megabytes.

    Any ideas what might be happening? is this a known issue?

    Collapse
Viewing 6 replies - 1 through 6 (of 6 total)