This is the first post in a series that explores recent innovations in the Hadoop ecosystem that are included in HDP 2.2. In this post, we introduce themes to set context for deeper discussion in subsequent blogs.
HDP 2.2 represents another major step forward for Enterprise Hadoop. With thousands of enhancements across all elements of the platform spanning data access to security to governance, rolling upgrades and more, HDP 2.2 makes it even easier for our customers to incorporate HDP as a core component of Modern Data Architecture (MDA).
HDP 2.2 brings substantial innovations in YARN, enabling Hadoop to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it simultaneously in multiple ways.
Thematically, YARN in HDP 2.2 encompasses the following tracks:
Apache Hadoop MapReduce has always served as a powerful framework for a simple but scalable and fault-tolerant data processing. With YARN, MapReduce went through a second incarnation – MRv2 – that is more scalable, much more performant, and all without major changes to their apps. So, for example, if you are moving from Hadoop 1.x to Hadoop 2.x with YARN, even without immediate intentions to leverage other programming abstractions, we ensure you have incentives to upgrade to the latest stack.
Over the course of last year, we also saw the rise of Apache Tez as the next chapter in data processing through Hadoop. Apache Tez is a YARN framework that enables multiple user-level APIs and programming models such as MapReduce, Hive, Pig and others to execute data processing jobs natively on YARN, taking advantage of all the common infrastructure functionality YARN exposes. Tez is by nature both a near successor of the MapReduce framework and a radical departure in terms of how user applications translate to system requirements and the way resources get utilized through YARN.
MapReduce and Tez on YARN pave a long way for unified data processing on Hadoop clusters through custom written applications. But increasingly, Enterprises have existing applications and services, distributed or non-distributed, that they desire to run on already setup Hadoop clusters in order to leverage the ever-growing and large-scale accumulation of data-sets in the Hadoop Distributed File System (HDFS). We set out to address these upcoming use-cases by two parallel but related efforts.
Running long running services on YARN is fundamentally not so different from running short-lived applications except for a few differences in (1) resource allocation, (2) fault tolerance, (3) log handling and (4) security. In HDP 2.2, YARN addresses all these differences so that users can interact with services very similar to how they interact with regular short applications.
YARN-896 is the Apache JIRA that tracks the core efforts related to supporting long running services in YARN. We will cover services in detail in an upcoming post.
Running diverse analytical processing workloads on a common data platform enables organizations to interact with data in a single repository and consolidate infrastructure investments. In HDP 2.2, YARN enables new workloads to run on the common data platform by delivering needed capabilities across workload scheduling, isolation and monitoring frameworks to support these workloads.
Up until HDP 2.1, YARN has managed Memory (RAM) as a resource in a distributed cluster. With HDP 2.2, YARN now manages both Memory and CPU resources in the cluster. Applications are now able to request for allocations based on RAM and CPU needs for each application container. YARN will then schedule the application containers on nodes that satisfy both the RAM and CPU needs, and then enforce that the container will not exceed the RAM and CPU resources allocated.
For workloads that require strict constraints on the physical node location that application containers are executed on, YARN now supports Node Labels. With labels, cluster nodes can be reserved to run specific applications, allowing for isolation of applications to nodes with specific software or hardware characteristics.
These core workload-scheduling enhancements enable the platform to support both diverse and specific workloads. To expand the ecosystem of applications integrated with YARN, HDP 2.2 introduces support for clients to submit, monitor and terminate applications over REST APIs. This enables clients to manage applications on YARN without having to utilize and embed the YARN Java libraries.
Before HDP 2.1, YARN’s central master, the ResourceManager, was a potential single point of failure in a YARN cluster, which is not acceptable in an enterprise production environment. We worked together with the YARN community in setting out to plug these gaps via various umbrella efforts with the ultimate goal of ensuring that RM restart or fail-over is completely transparent to the end-users with zero or minimal impact to running applications.
To this end, as part of HDP 2.1, we delivered our Phase 1 work towards resilience of YARN applications across ResourceManager Restart. This mainly involved preservation of Application-queues across restarts. In essence, we preserve the application-queue state into a persistent store and reload it upon restart, eliminating the need for users to resubmit their applications. Instead, the RM merely restarts them from scratch.
However, the missing piece in Phase I was that users still lost any running work in progress. As the next step towards reliable operations, we are delivering our Phase 2 work towards resilience of YARN Applications across ResourceManager Restart in HDP 2.2. This effort involves making YARN reconstruct the entire running-state of the cluster as it was before the restart together with the corresponding applications. The end-result is that applications do not lose any running or completed work due to a RM crash-reboot event, and no restart of applications or containers is required, as the ApplicationMasters will just re-sync with the newly started ResourceManager.
As part of HDP 2.1, we also supported the ability to failover ResourceManager from one instance to another, usually running on a different physical machine, for high availability. It involves leader election, transfer of resource-management authority to a newly elected leader, and client re-direction to the new leader.
Wiring it all together, starting with HDP 2.2, we support rolling upgrades of a YARN cluster! As part of this work, we have decoupled the upgrade of frameworks like MapReduce, Tez etc so that specific sites or users can upgrade the platform first, ensure that existing workloads are not affected, and then continue to upgrade the application frameworks to the next release.
HDP 2.2 also ships a few community-developed features as Technical Preview. The prominent features are:
Please note that these features are still early for use and are released as Technical Preview for evaluation.
With HDP 2.2, YARN expands and emboldens the support of enterprise analytical workloads that can run within the data platform. Enterprises can now run these workloads and rely on the applications being resilient to platform upgrades and failure scenarios.
In the coming weeks, we will be covering each of the above tracks in much more detail. Stay tuned!