Apache Ambari 2.0 User Views introduce two functional tools to help you understand and optimize your cluster resources to get the best performance in a multitenant Hadoop environment.
The Tez View gives you visibility into all the jobs on your cluster, allowing you to quickly identify which jobs consume the most resources and which are the best candidates to optimize.
With the Tez View you can quickly spot Hive or Pig jobs that are taking the longest, writing the most data or consuming the most CPU. Once you’ve identified these big jobs, the Tez View lets you drill in to see exactly how the job is running and helps you identify ways to optimize it
Important job running slow? You need to drill down and see what’s happening in the job. The Tez View lets you see exactly how the job is executed and the resources it uses at every step of the way.
One common performance bottleneck in SQL is doing a reduce-side join when you could do a map-side join instead. A reduce-side join requires large amounts of data to move over the network and lots of temporary data to be written. With a map-side join, small amounts of data move over the network and SQL processing happens in-place. Map-side joins can be more than 10 times faster than reduce-side joins so you want to do them whenever you can, even if it means making a few special configurations for that big job.
With the Tez View you can spot this problem easily and correct it all within Ambari. Let’s look at an example.
Using the Tez View we quickly spot a shuffle join, which we want to avoid if possible. Hive tries to convert joins to map-side joins automatically but this is constrained by the size of a Tez container. If you have some extremely large dimension tables it may make sense to use custom settings for the job and increase both the container size and the variable that controls Hive’s map-side join threshold (see Hive’s Join Optimization page for more info). When we do that the plan looks quite different:
Why does this help? A map join minimizes the need to write massive amounts of meaningless temporary data, in this case less than 1% as much temporary data is written after the switch.
It’s not uncommon for a conversion to map join to accelerate a large job 10x or more.
Another common wasteful scenario is queries that try to join 2 fact tables together. Queries like this should be optimized either by enabling Hive’s Cost-Based Optimizer or manually changing the join order. The Tez View makes it easy to find these big queries and fix them.
The YARN Capacity Scheduler allows Hadoop to be shared among multiple independent tenants while providing guaranteed capacity and predictable SLAs. The Capacity Scheduler divides resources through use of YARN queues, which are sized based on the relative allocations given to various tenants.
Until now, configuring queues has required hand-editing XML files, so the process was error-prone and it was difficult to get an overall visibility of how the capacity scheduler was dividing resources. As well, configuring a queue comes with a lot of rules: all queues at a given level must utilize all capacity, max capacity cannot be less than capacity, removing a queue requires a ResourceManager restart, and the ACL syntax for job submission + queue administration must be formatted exactly right. Follow all that? No? Install the View!
The Capacity Scheduler View solves these by providing a simple UI that lets you create and modify YARN queues and see their distribution at-a-glance. The UI enforces configuration rules, highlights invalid conditions and hides the complex syntax of setting ACLs. The View is also smart enough to know if a disruptive ResourceManager restart is needed or if you can simply refresh the configuration with no downtime.
For instance here we see that 60% of cluster resources are dedicated to Engineering, and within that, QE gets the majority of resources. Despite this, Development has a max capacity of 100%, meaning that if QE is not using its resources, Development is free to take advantage of them.
With the Capacity Scheduler View you can easily:
Try Ambari User View Technical Preview!
Try Ambari User View Technical Preview!
Ambari User Views are designed to provide capabilities that assist with the operational aspects of data application development and workload management. All the new Ambari Views have been pre-installed in the newly updated Hortonworks Sandbox, so just download and you’re ready to go. Want to try these on an existing cluster? To download and configure the Ambari User Views Technical Preview use this document. If you have questions or feedback on the User Views please post them to the Ambari User View Forum.
|Tech Preview User Views||Description|
|Hive||Hive View allows the user to write & execute SQL queries on the cluster. It shows the history of all Hive queries executed on the cluster whether run from Hive View or another source such as JDBC/ODBC or CLI. It also provides graphical view of the query execution plan. This helps the user debug the query for correctness and for tuning the performance. It integrates Tez View that allows the user to debug any Tez job, including monitoring the progress of a job (whether from Hive or Pig) while it is running. This View contribution can be found here.|
|Pig||Pig View is similar to the Hive View. It allows writing and running a Pig script. It has support for saving scripts, and loading and using existing UDFs in scripts. This View contribution can be found here.|
|Capacity Scheduler||Capacity Scheduler View helps a Hadoop operator setup YARN workload management easily to enable multi-tenant and multi-workload processing. This View provisions cluster resources by creating and managing YARN queues. This View contribution can be found here.|
|Files||Files View allows the user to manage, browse and upload files and folders in HDFS. This View contribution can be found here.|