Enforcing application lifetime SLAs on YARN

Enforcing application lifetime SLAs on YARN

This blog post was published on Hortonworks.com before the merger with Cloudera. Some links, resources, or references may no longer be accurate.
If you’re interested in learning more, go to our recap blog here!

Introduction

Lifetime indicates the overall time spent by an application in YARN. The lifetime of an application is calculated from its start time to finish time, including the actual run time as well as the wait time for resource allocation. Both users and administrators on the YARN system might occasionally be required to restrict the duration of specific lifetime Service Level Agreements (SLAs).

The requirements for restricting the SLA durations are different for users and administrators. For example, a user might run a scheduled cron job that returns statistics of an application run during specific ‘N’ minutes on a daily basis. Assume that the scheduled job runs at specified times with different datasets and takes about an hour to complete. If the job does not complete within the estimated time duration, the resultant output might not be useful to the user, especially if they monitor the application to claim its resources after the run. Therefore, restricting the lifetime of an application removes the need to monitor its run.

An administrator might be required to restrict the application lifetime for a particular leaf queue. This requirement can be very helpful in organizations where queues are shared across many departments. In such scenarios, restricting the lifetime of applications submitted to leaf queues would ensure optimal availability of resources among users from different departments who wish to submit their jobs.

YARN addresses the requirements of both users and administrators by enabling them to configure the lifetime for applications. This feature is available in Apache Hadoop starting with Hadoop 2.9. Hortonworks Data Platform (HDP) includes this feature starting with HDP 3.0.

How can admins enforce application lifetime SLAs?

YARN allows admins to set lifetime of an application at leaf-queue in capacity-scheduler.xml. Below are queue configurations to set application lifetime for a leaf-queue.

 

Property Description
yarn.scheduler.capacity.<queue-path>.maximum-application-lifetime Maximum lifetime of an application which is submitted to a queue in seconds. Any value less than or equal to zero will be considered as disabled. This will be a hard time limit for all applications in this queue. If positive value is configured then any application submitted to this queue will be killed after exceeds the configured lifetime. User can also specify lifetime per application basis in application submission context. But user lifetime will be overridden if it exceeds queue maximum lifetime. It is point-in-time configuration. Note : Configuring too low value will result in killing application sooner. This feature is applicable only for leaf queue.
yarn.scheduler.capacity.root.<queue-path>.default-application-lifetime Default lifetime of an application which is submitted to a queue in seconds. Any value less than or equal to zero will be considered as disabled. If the user has not submitted application with lifetime value then this value will be taken. It is point-in-time configuration. Note : Default lifetime can’t exceed maximum lifetime. This feature is applicable only for leaf queue.

How can users enforce application lifetime SLAs?

User can set lifetime of an application either during job submission or update lifetime after submission. This will ensures application doesn’t run more than configured application lifetime SLA.

Set app lifetime using Java API

During application submission user can set lifetime in  ApplicationSubmissionContext  i.e ApplicationSubmissionContext#setApplicationTimeouts(Map<ApplicationTimeoutType, Long> applicationTimeouts). As of today, YARN supports for one timeout type i.e LIFETIME and corresponding timeout value in seconds.

Sample Code:

Timeout type LIFETIME is timeout imposed on overall application life time. It includes actual run-time plus non-runtime. Non-runtime includes time elapsed by scheduler to allocate container, time taken to store in RMStateStore and etc.

CLI/REST Interfaces

CLI

YARN provides CLI interface to update lifetime of an application. User can update lifetime i.e either extend or reduce the lifetime value.

Syntax:

Below CLI command update lifetime of an application is from NOW.  

Example:

In the above example, lifetime for application application_1465246237936_0001 is updated to 300 seconds from NOW. If current time is 10:00 AM then application timeout happens at 10:05 AM.

REST

Update Lifetime of an Application

Update timeout of an application for given timeout type.

URI:

http://rm-http-address:port/ws/v1/cluster/apps/{appid}/timeout

HTTP Operations Supported:

PUT

Elements of the timeout object

Item Data Type Description
type string Timeout type. Valid values are the members of the ApplicationTimeoutType enum. LIFETIME is currently the only valid value.
expiryTime string Time at which the application will expire in ISO8601 yyyy-MM-dd’T’HH:mm:ss.SSSZ format.

HTTP Request

Response Header:

Response Body:

Get Lifetime of an Application

URL:

http://rm-http-address:port/ws/v1/cluster/apps/{appid}/timeouts/{type}

HTTP Operations Supported:

GET

Elements of the timeout (Application Timeout) object

Item Data Type Description
type string Timeout type. Valid values are the members of the ApplicationTimeoutType enum. LIFETIME is currently the only valid value.
expiryTime string Time at which the application will expire in ISO8601 yyyy-MM-dd’T’HH:mm:ss.SSSZ format.
remainingTimeInSeconds long Remaining time for configured application timeout. -1 indicates that application is not configured with timeout. Zero(0) indicates that application has expired with configured timeout type.

HTTP Request:

Response Header:

Response Body:

Conclusion

YARN enforces application lifetime SLAs by providing configurations and APIs. One can make use of this feature to auto clean up of applications and release the resources. This feature has been implemented as part of YARN-3813.

Acknowledgements

We would like to thank all those who contributed patches to Application Timeout feature: Akhil PB, Miklos Szegedi (besides the authors of this post). Thanks also to Jian He, Vinod Vavilapalli, Sunil Govindan, Wangda Tan, Varun Vasudev, Nijel SF for their helping designing and reviews!

Be sure to check out our recap blog which you can find here!

Rohith Sharma KS
Staff Engineer
More by this author

Leave a comment

Your email address will not be published. Links are not permitted in comments.