YARN Forum

Help for YARN configuration

  • #52649
    Michel Sumbul


    I have 6 nodes with 128GB of Ram, 32 cores on each, 9 disks and a VIC of 10Gb on each.
    I would like to see the best performance of this hardware that I can have. For that I used DSFIO, Terasort, wordcount, ect.

    I installed Hortonworks with ambari and the default configuration of Yarn and MRv2 are the following:

    yarn.nodemanager.resource.memory-mb: 79872
    yarn.nodemanager.vmem-pmem-ratio: 2.1
    yarn.scheduler.minimum-allocation-mb: 6144
    yarn.scheduler.maximum-allocation-mb: 79872

    mapreduce.map.memory.mb: 6144
    mapreduce.reduce.memory.mb: 6144
    yarn.app.mapreduce.am.resource.mb: 6144
    mapreduce.job.reduce.slowstart.completedmaps: 0.05
    mapreduce.map.java.opts: -Xmx4915m
    mapreduce.reduce.java.opts: -Xmx4915m

    The rest are normally the default value (I think)

    I found the result of my test really bad (terasort: 85minutes), can you give me some tips to configure yarn and MRv2 to have better performance?

    I made some test with this configuration but the results are still not really good (terasort: 69minutes).

    yarn.nodemanager.resource.memory-mb: 79872
    yarn.nodemanager.vmem-pmem-ratio: 2.1
    yarn.scheduler.minimum-allocation-mb: 2048
    yarn.scheduler.maximum-allocation-mb: 79872

    mapreduce.map.memory.mb: 2048
    mapreduce.reduce.memory.mb: 2048
    yarn.app.mapreduce.am.resource.mb: 2048
    mapreduce.job.reduce.slowstart.completedmaps: 0.9
    mapreduce.map.java.opts: -Xmx735m
    mapreduce.reduce.java.opts: -Xmx735m

    Thanks in advance,

to create new topics or reply. | New User Registration

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.