Home Forums Hive / HCatalog GC Error/OOM w/ Hive Query

Tagged: 

This topic contains 3 replies, has 3 voices, and was last updated by  Carter Shanklin 5 months, 3 weeks ago.

  • Creator
    Topic
  • #46160

    Nick Martin
    Participant

    Hi all,

    I have two tables:

    tbl1: 81m rows
    tbl2: 4m rows

    tbl1 is partitioned on one column and tbl2 has none.

    I’m attempting the following query:

    SELECT
    tbl1.col_pk,
    tbl2.col1,
    tbl2.col2,
    SUM(tbl1.col4),
    SUM(tbl1.col5),
    SUM(tbl1.col4+col5)
    FROM tbl2
    JOIN tbl1 ON (tbl1.col_pk=tbl2.col_pk)
    WHERE tbl1.partitioned_col IN (’2011′,’2012′,’2013′)
    GROUP BY
    tbl1.col_pk,
    tbl2.col1,
    tbl2.col2;

    I get this error:

    OutOfMemoryError: GC overhead limit exceeded

    So, I followed the suggestion at the end of the error output (Currently hive.map.aggr.hash.percentmemory is set to 0.5. Try setting it to a lower value. i.e ‘set hive.map.aggr.hash.percentmemory = 0.25;’) through several iterations, eventually getting my hive.map.aggr.hash.percentmemory setting down to something like .0165 and it still failed.

    I did some searching and found some convoluted recommendations of what to try next. Some mentioned upping my heap size, some mentioned re-writing my query, etc. I upped my Hadoop maximum Java heap size to 4096mb ,re-ran, and got the same results.

    Currently, some relevant settings are:

    NameNode Heap Size: 4096mb
    DataNode maximum Java heap size: 4096mb
    Hadoop maximum Java heap size: 4096mb
    Java Options for MapReduce tasks: 768mb

    I have 16 map slots and 8 reduce slots available (5 node cluster, 4 data and one name)

    Thanks in advance for the help,
    Nick

Viewing 3 replies - 1 through 3 (of 3 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #48385

    Carter Shanklin
    Participant

    What options did you set? My guess is your OOMs happened in the reducers. 768mb is a really small amount of memory, ensure you increased heap space for reduces as well as maps.

    Collapse
    #48205

    Nick Martin
    Participant

    Increased the mapred task jvm heap by 2x and still seeing the same results.

    Collapse
    #46763

    Yi Zhang
    Moderator

    Hi Nick,

    If it is the task that is OOM, try increase the mapred task jvm heap.

    Also, for this query of mainly sum function, suggest give orc table a try.

    http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.0.2/ds_Hive/orcfile.html

    Thanks,
    Yi

    Collapse
Viewing 3 replies - 1 through 3 (of 3 total)