Making Apache Spark YARN Ready

Spark on YARN as an element of Enterprise Hadoop

Spark on YARN

Hadoop 2 and its YARN-based architecture has ushered in a new wave of innovation in and around Hadoop. One technology benefitting from this maturation is Apache Spark. Spark is a unique and powerful engine for building and executing iterative algorithms for advanced analytics such as clustering and classification of datasets.

HDPSparkIn early May, we made Spark available as a Technology Preview download for use with Hortonworks Data Platform 2.1, and in June we announced our broader “YARN Ready” program aimed at accelerating the number of data processing solutions that take advantage of YARN as the architectural center of Hadoop.

Today, we announce certification of Apache Spark as YARN Ready. This certification ensures memory and CPU intensive Spark-based applications can co-exist within a single Hadoop cluster with all the other workloads you have deployed. Together, they allow you to use a single cluster with a single set of data for multiple purposes rather than silo your Spark workloads into a separate cluster.

This means you can deploy interactive SQL query applications with Hive and low latency application using HBase alongside your iterative, machine learning workloads deployed using Spark. As such, you eliminate the need to have a separate system or separate set of resources for your data science work.

Certifying Spark as “YARN Ready” provides assurance to end users interested in deploying their data lakes so that their YARN-based applications, including Spark applications, work cooperatively with predictable performance.

The Hortonworks Commitment

Hortonworks’ tech preview of Apache Spark is part of a larger initiative that will bring the best of heterogeneous, tiered storage, and resource-based models of computing together with the broader Hadoop community starting at the core of HDFS and working outward and upward. Certifying Spark as YARN Ready, integrating Spark with Ambari so it’s easily provisioned, managed, and monitored, and integrating Spark with XA Secure (our recent security-related acquisition) for centralized authentication and audit are just some of the efforts that prepare Spark for use within a broader Enterprise Hadoop platform.

Our focus remains on delivering a fast, safe, scalable, and manageable data platform on a consistent footprint that includes HDFS, YARN, Tez, Ambari, Knox, Falcon and Spark to name just a few of the critical components of enterprise Hadoop. Working within this comprehensive set of components, we make Apache Spark “enterprise ready” so that our customers can confidently adopt it.

Learn More

Categorized by :
Spark YARN


June 26, 2014 at 9:24 am

Spark 1.0 or 0.9? Great news either way.

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.