Management of Application Dependencies in YARN
This post is authored by Omkar Vinit Joshi with Vinod Kumar Vavilapalli and is the 8th post in the multi-part blog series on Apache Hadoop YARN – a general-purpose, distributed, application management framework that supersedes the classic Apache Hadoop MapReduce framework for processing data in Hadoop clusters. Other posts in this series:
- Introducing Apache Hadoop YARN
- Apache Hadoop YARN – Background and an Overview
- Apache Hadoop YARN – Concepts and Applications
- Apache Hadoop YARN – ResourceManager
- Apache Hadoop YARN – NodeManager
- Running existing applications on Hadoop 2 YARN
- Stabilizing YARN APIs for Apache Hadoop 2
In YARN, applications perform their work by running containers, which today map to processes on the underlying operating system. More often than that, containers have dependencies on files for execution. These files are either required at startup or may be during runtime – just once or more number of times. For example, to launch a simple java program as a container, we need a jar file and potentially more jars as dependencies. Instead of forcing every application to either access (mostly just reading) these files remotely every time or manage the files themselves, YARN gives the applications the ability to localize these files.
At the time of starting a container, an ApplicationMaster (AM) can specify all the files that a container will require and thus should be localized. Once specified, YARN takes care of the localization by itself and hides all the complications involved in securely copying, managing and later deleting these files.
In the remainder of this post, we’ll explain the basic concepts about this functionality.
Here are some definitions to begin with:
Localization. Localization is the process of copying/download remote resources onto the local file-system. Instead of always accessing a resource remotely, it is copied to the local machine which can then be accessed locally.
LocalResource. LocalResource represents a file/library required to run a container. The NodeManager is responsible for localizing the resource prior to launching the container. For each LocalResource, Applications can specify:
- URL: Remote location from where a LocalResource has to be downloaded
- Size: Size in bytes of the LocalResource
- Last-modification timestamp of the resource on the remote file-system before container-start.
- LocalResourceType: Specifies the type of a resource localized by the NodeManager – FILE, ARCHIVE and PATTERN
- Pattern: the pattern that should be used to extract entries from the archive (only used when type is PATTERN).
- LocalResourceVisibility: Specifies the visibility of a resource localized by the NodeManager. The visibility can be one of PUBLIC, PRIVATE and APPLICATION
What files can a container request for localization? One can use any kind of files that are meant to be read-only by the containers. Typical examples of LocalResources include:
- Libraries required for starting the container such as a jar file
- Configuration files required to configure the container once started (remote service urls, application default configs etc).
- A static dictionary file.
The following are some examples of bad candidates for LocalResources:
- Shared files that external components may potentially update in future and current containers wish to track these changes,
- Files that applications themselves directly want to update or
- File through which an application plans to share the updated information with external services.
Other related definitions
- ResourceLocalizationService: As previously described in the post about NodeManager, ResourceLocalizationService is the service inside NodeManager that is responsible for securely downloading and organizing various file resources needed by containers. It tries its best to distribute the files across all the available disks, enforces access control restrictions of the downloaded files and puts appropriate usage limits on them.
- DeletionService: A service that runs inside the NodeManager and deletes local paths as and when instructed to do so.
- Localizer: The actual thread or process that does Localization. There are two types of Localizers – PublicLocalizer for PUBLIC resources and ContainerLocalizers for PRIVATE and APPLICATION resources.
- LocalCache: NodeManager maintains and manages serveral local-cache of all the files downloaded. The resources are uniquely identified based on the remote-url originally used while copying that file.
As mentioned above, NodeManager tracks the last-modification timestamp of each LocalResource before container-start. Before downloading, NodeManager checks that the files haven’t changed in the interim. This is a way of giving a consistent view at the LocalResources – an application can use the very same file-contents all the time it runs without worrying about data corruption issues due to concurrent writers to the same file.
Once the file is copied from its remote location to one of the NodeManager’s local disks, it loses any connection to the original file other than the URL (used while copying). Any future modifications to the remote file are NOT tracked and hence if an external system has to update the remote resource – it should be done via versioning. YARN will fail containers that depend on modified remote resources to prevent inconsistencies.
Note that ApplicationMaster specifies the resource time-stamps to a NodeManager while starting any container on that node. Similarly, for the container running the ApplicationMaster itself, the client has to populate the time-stamps for all the resources that ApplicationMaster needs.
In case of a MapReduce application, the MapReduce JobClient determines the modification-timestamps of the resources needed by MapReduce ApplicationMaster. The ApplicationMaster itself then sets the timestamps for the resources needed by the MapReduce tasks.
Each LocalResource can be of one of the following types:
- FILE. A regular file, either textual or binary
- ARCHIVE. An archive, which is automatically unarchived by the NodeManager. As of now, NodeManager recognizes jars, tars, tar.gz files and .zip files.
- PATTERN. A hybrid of ARCHIVE and FILE types. The original file is retained, and at the same time (only) part of the file is unarchived on the local-filesystem during localizaiton. Both the original file and extracted files are put in the same directory. Which contents have to extracted from the ARCHIVE and which shouldn’t be is determined by the pattern field in the LocalResource specification. Currently only jar files are supported under PATTERN type, all others are treated as a regular ARCHIVE.
LocalResources can be of three types depending upon their specified LocalResourceVisibility, i.e depending on how visible/accessible they are on the original storage/file system.
All the LocalResources (remote URLs) that are marked PUBLIC are accessible for containers of any user. Typically PUBLIC resources are those that can be accessed by anyone on the remote file-system and, following the same ACLs, are copied into public LocalCache. If in future, a container belonging to this or any other application (of this or any user) requests the same LocalResource, it is served from the LocalCache and thus not copied/downloaded again if it isn’t evicted from the LocalCache by then. All files in public cache will be owned by “yarn-user” (user which NodeManager runs as) with world-readable permissions, so that they can be shared by containers from all users whose containers are running on that node.
LocalResources that are marked private are shared among all applications of the same user on the node. These LocalResourcse ar ecopied into the specific user’s (user who started the container i.e. the application submitter’s) private cache. These files are accessible to all the containers belonging to different applications but all started by the same user. These files on local file system are owned by the user and not accessible by any other user. Similar to public LocalCache, even for the application submitters, there aren’t any write permissions – the user cannot modify these files once localized. This is to avoid accidental writes to these files by one container harming other containers – all containers expect them to be in the same state as originally specified (mirroring original timestamp and/or version number).
All the resources that are marked under “APPLICATION” scope are shared only amongst containers of the same application on the node. They are copied into the application specific LocalCache which is owned by the user who started the container (application-submitter). All theses files are owned by the user with read-only permissions.
Notes on LocalResource Visibilities
Note that ApplicationMaster specifies this resource visibility to a NodeManager while starting the container – Node manager itself doesn’t make any decision and classify resources. Similarly, for the container running the ApplicationMaster itself, the client has to specify visibilities for all the resources that ApplicationMaster needs.
In case of a MapReduce application, the MapReduce JobClient decides the resource-type which the corresponding ApplicationMaster then forwards to a NodeManager.
Life-time of the LocalResources
Like already mentioned, different type of LocalResources have different life-cycles:
- PUBLIC LocalResources are not deleted once the container or application finishes. They are only deleted when there is a pressure on each local-directory for disk capacity. The threshold for local files is dictated by the configuration property yarn.nodemanager.localizer.cache.target-size-mb described below.
- PRIVATE LocalResources also follow the same life-cycle as PUBLIC resources. In future, we wish to track separate thresholds for different users.
- APPLICATION scoped LocalResources are deleted immediately after the application finishes.
One thing of note is that for any given application, we may have multiple ApplicationAttempts and each attempt may start zero or more containers on a given node manager. When the first container belonging to an ApplicationAttempt starts, ResourceLocalizationService localizes files for that application as requested in the container’s launch context. If future containers request more such resources then they all will be localized. If one ApplicationAttempt finishes/fails and another is started, ResourceLocalizationService doesn’t do anything w.r.t the previously localized resources. However when eventually the application finishes, ResourceManager communicates that information to NodeManagers which in turn clear the application LocalCache. In summary, APPLICATION LocalResources are truly application scoped and not ApplicationAttempt scoped.
That ends our coverage of the basic concepts that application writers will need to know about LocalResources. LocalResources are a very useful feature that application writers can exploit to declare their startup and runtime dependencies. In the next post, we’ll delve deep into how the localization process itself actually takes place in the NodeManager.