Help for YARN configuration

to create new topics or reply. | New User Registration

This topic contains 0 replies, has 1 voice, and was last updated by  Michel Sumbul 1 year, 2 months ago.

  • Creator
  • #52649

    Michel Sumbul


    I have 6 nodes with 128GB of Ram, 32 cores on each, 9 disks and a VIC of 10Gb on each.
    I would like to see the best performance of this hardware that I can have. For that I used DSFIO, Terasort, wordcount, ect.

    I installed Hortonworks with ambari and the default configuration of Yarn and MRv2 are the following:

    yarn.nodemanager.resource.memory-mb: 79872
    yarn.nodemanager.vmem-pmem-ratio: 2.1
    yarn.scheduler.minimum-allocation-mb: 6144
    yarn.scheduler.maximum-allocation-mb: 79872 6144
    mapreduce.reduce.memory.mb: 6144 6144
    mapreduce.job.reduce.slowstart.completedmaps: 0.05 -Xmx4915m -Xmx4915m

    The rest are normally the default value (I think)

    I found the result of my test really bad (terasort: 85minutes), can you give me some tips to configure yarn and MRv2 to have better performance?

    I made some test with this configuration but the results are still not really good (terasort: 69minutes).

    yarn.nodemanager.resource.memory-mb: 79872
    yarn.nodemanager.vmem-pmem-ratio: 2.1
    yarn.scheduler.minimum-allocation-mb: 2048
    yarn.scheduler.maximum-allocation-mb: 79872 2048
    mapreduce.reduce.memory.mb: 2048 2048
    mapreduce.job.reduce.slowstart.completedmaps: 0.9 -Xmx735m -Xmx735m

    Thanks in advance,

You must be to reply to this topic. | Create Account

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.