Making Apache Spark YARN Ready
Spark on YARN
Hadoop 2 and its YARN-based architecture has ushered in a new wave of innovation in and around Hadoop. One technology benefitting from this maturation is Apache Spark. Spark is a unique and powerful engine for building and executing iterative algorithms for advanced analytics such as clustering and classification of datasets.
In early May, we made Spark available as a Technology Preview download for use with Hortonworks Data Platform 2.1, and in June we announced our broader “YARN Ready” program aimed at accelerating the number of data processing solutions that take advantage of YARN as the architectural center of Hadoop.
Today, we announce certification of Apache Spark as YARN Ready. This certification ensures memory and CPU intensive Spark-based applications can co-exist within a single Hadoop cluster with all the other workloads you have deployed. Together, they allow you to use a single cluster with a single set of data for multiple purposes rather than silo your Spark workloads into a separate cluster.
This means you can deploy interactive SQL query applications with Hive and low latency application using HBase alongside your iterative, machine learning workloads deployed using Spark. As such, you eliminate the need to have a separate system or separate set of resources for your data science work.
Certifying Spark as “YARN Ready” provides assurance to end users interested in deploying their data lakes so that their YARN-based applications, including Spark applications, work cooperatively with predictable performance.
The Hortonworks Commitment
Hortonworks’ tech preview of Apache Spark is part of a larger initiative that will bring the best of heterogeneous, tiered storage, and resource-based models of computing together with the broader Hadoop community starting at the core of HDFS and working outward and upward. Certifying Spark as YARN Ready, integrating Spark with Ambari so it’s easily provisioned, managed, and monitored, and integrating Spark with XA Secure (our recent security-related acquisition) for centralized authentication and audit are just some of the efforts that prepare Spark for use within a broader Enterprise Hadoop platform.
Our focus remains on delivering a fast, safe, scalable, and manageable data platform on a consistent footprint that includes HDFS, YARN, Tez, Ambari, Knox, Falcon and Spark to name just a few of the critical components of enterprise Hadoop. Working within this comprehensive set of components, we make Apache Spark “enterprise ready” so that our customers can confidently adopt it.