Advanced Execution Visualization of Spark jobs
Author: Zoltán Zvara, Márton Balassi, András Garzó, Hungarian Academy of Sciences in collaboration with Ericsson
Understanding the physical plan of a big data application is often crucial for tracking down bottlenecks and faulty behavior. Apache Spark although offering useful Web UI component for monitoring and understanding the logical plan of the jobs, lacks a tool that helps to understand the physical plan of the task scheduler and the possibility to monitor execution at a very low level, along with the communication triggered by RDDs and remote block-requests. We propose a tool that allows users to real-time monitor and later to replay, examine job executions on any cluster currently supported by Spark.
Our execution-visualizer implementation gives the following benefits to end-users:
After this talk you will know more about:
In our talk proposal we have stated the intention to extend the tool to support other frameworks in the Hadoop ecosystem in the future. Since then we have started implementing the data generator on top of Flink’s REST API.
Check out the features of a previous version of the tool in this video:
Register for the Hadoop Summit in Dublin here.