Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.
Thank you for subscribing!
Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.
Thank you for subscribing!
Apache Hadoop YARN is well known as the general resource-management platform for big-data applications such as MapReduce, Hive / Tez and Spark. It abstracts the complicated cluster resource management and scheduling from higher level applications and enables them to focus solely on their own application specific logic.
In addition to big-data apps, another broad spectrum of workloads we see today are long running services such as HBase, Hive/LLAP and container (e.g. Docker) based services. In the past year, the YARN community has been working hard to build first-class support for long running services on YARN.
This feature, what we call as YARN service framework, has been just merged to trunk back in Nov 2017. In total, it has 108 commits with 33539 lines of code changes. It is expected to become available in the Apache Hadoop 3.1 release. This effort primarily includes the following:
And YARN Service framework goes hand-in-hand with a few other features in YARN:
The bulk of the complexity of managing a Service on YARN is all hidden from the users. Users only deal with a JSON specification to deploy and manage the services running on YARN through CLIs or REST APIs. Below is an example JSON specification for deploying a httpd container on YARN. Users can simply post this JSON spec through REST API or using CLI, and the system will handle the rest – launch and monitor the containers or any other actions required to keep the application running such as auto-restart of a container if it fails. For example:
1. To launch a service, run below command with the supplied JSON
yarn app -launch my-httpd /path/to/httpd.json
{ "name": "httpd-service", "lifetime": "3600", "components": [ { "name": "httpd", "number_of_containers": 2, "artifact": { "id": "centos/httpd-24-centos7:latest", "type": "DOCKER" }, "launch_command": "/usr/bin/run-httpd", "resource": { "cpus": 1, "memory": "1024" } }] } |
2. To get the status of app
yarn app -status my-httpd
3. To flex the number of containers to 3:
yarn app -flex my-httpd -component httpd 3
4. To stop the service:
yarn app -stop my-httpd
5. To restart the stopped service:
yarn app -start my-httpd
The diagram below illustrates the main components involved in a full-fledged YARN cluster to support long running services – under the hood.
A typical workflow is:
Which area is YARN service framework good at?
This blog post gives a high level overview on how Apache Hadoop YARN supports container based services. In the next blog post, we’ll talk about more details on how simple it is to run more complicated services such as HIVE LLAP on YARN using this framework. Stay tuned!
Apache, Hadoop, Falcon, Atlas, Tez, Sqoop, Flume, Kafka, Pig, Hive, HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper, Oozie, Phoenix, NiFi, Nifi Registry, HAWQ, Zeppelin, Slider, Mahout, MapReduce, HDFS, YARN, Metron and the Hadoop elephant and Apache project logos are either registered trademarks or trademarks of the Apache Software Foundation in the United States or other countries.
© 2011-2018 Hortonworks Inc. All Rights Reserved.
Comments
Love your blog man. Thanks for this stuff.
https://tweakbox-app.com
https://tweakbox-app.org
https://gameguardianapp.com
Can yarn do smaller slices than 1 cpu? If not it isn’t a contender in this space. With Spark and others running fine on Kubernetes, gaining Kerberos support, and Kubernetes having much more momentum I think it is safe to say the Hadoop ecosystem is breaking up.