YARN availability

to create new topics or reply. | New User Registration


This topic contains 1 reply, has 2 voices, and was last updated by  tedr 1 year, 10 months ago.

  • Creator
  • #24678

    Tobias Herb

    Hey all!

    I´m a student and just experimenting with YARN and I got a couple questions concerning the “availability mechanisms” in YARN. Maybe someone of you can help me and give me some hints…

    I wrote a simple demo application (comparable to the “Distributed Shell” example) and tested it on a single node setup. That all worked great!
    Now I want to investigate in the “Failure Tolerance / Availability” topic, for that I examined several scenarios (but YARN did not behave as I expected):

    (1) Start the YARN app (1 Client, 1 ApplicationMaster (AM) and 1 Worker/Task (associated with the AM)). Kill the ResourceManager process (for simulating a crashed RM node) during running the AM/Worker. I would expect that the RM would be relaunched on a new allocated container (If so what component is responsible for that relaunch -> ZooKeeper stuff?). Is that assumption wrong? How must the AM react to reconnect to a new launched RM? Or is the complete system down after RM crash?

    (2) If I kill the AM process, I expect that the ApplicationsManager (ASM) restart the AM also on a new allocated container and execute all tasks that haven´t been executed so far.

    (3) (Liveness-Protocols) If I set the RM_AM_EXPIRY_INTERVAL_MS to 2 min and let the AM freeze (per sleep command) for 3 min nothing happens. I would also expect a restart of the AM by the ASM. But the job finishes without problems and without any log notification or something.

    (4) The liveness management for the worker nodes is completely handled by the AM. If a worker node crashes, the AM must restart all crashed tasks?

    I have these behavioral assumptions due to the design document “Architecture of Next Generation Apache Hadoop MapReduce Framework”…. But maybe my way of exploring is completely wrong…

    It would be great if someone could give me some few hints!!!

    Thanks in advance,

Viewing 1 replies (of 1 total)

You must be to reply to this topic. | Create Account

  • Author
  • #25368


    Hi Tobi,

    Thanks for trying HDP2.0. We’re looking onto these issues and will get back as soon as we have a definitive answer.


Viewing 1 replies (of 1 total)
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.