Making Apache Spark YARN Ready

Spark on YARN as an element of Enterprise Hadoop

Spark on YARN

Hadoop 2 and its YARN-based architecture has ushered in a new wave of innovation in and around Hadoop. One technology benefitting from this maturation is Apache Spark. Spark is a unique and powerful engine for building and executing iterative algorithms for advanced analytics such as clustering and classification of datasets.

HDPSparkIn early May, we made Spark available as a Technology Preview download for use with Hortonworks Data Platform 2.1, and in June we announced our broader “YARN Ready” program aimed at accelerating the number of data processing solutions that take advantage of YARN as the architectural center of Hadoop.

Today, we announce certification of Apache Spark as YARN Ready. This certification ensures memory and CPU intensive Spark-based applications can co-exist within a single Hadoop cluster with all the other workloads you have deployed. Together, they allow you to use a single cluster with a single set of data for multiple purposes rather than silo your Spark workloads into a separate cluster.

This means you can deploy interactive SQL query applications with Hive and low latency application using HBase alongside your iterative, machine learning workloads deployed using Spark. As such, you eliminate the need to have a separate system or separate set of resources for your data science work.

Certifying Spark as “YARN Ready” provides assurance to end users interested in deploying their data lakes so that their YARN-based applications, including Spark applications, work cooperatively with predictable performance.

The Hortonworks Commitment

Hortonworks’ tech preview of Apache Spark is part of a larger initiative that will bring the best of heterogeneous, tiered storage, and resource-based models of computing together with the broader Hadoop community starting at the core of HDFS and working outward and upward. Certifying Spark as YARN Ready, integrating Spark with Ambari so it’s easily provisioned, managed, and monitored, and integrating Spark with XA Secure (our recent security-related acquisition) for centralized authentication and audit are just some of the efforts that prepare Spark for use within a broader Enterprise Hadoop platform.

Our focus remains on delivering a fast, safe, scalable, and manageable data platform on a consistent footprint that includes HDFS, YARN, Tez, Ambari, Knox, Falcon and Spark to name just a few of the critical components of enterprise Hadoop. Working within this comprehensive set of components, we make Apache Spark “enterprise ready” so that our customers can confidently adopt it.

Learn More

Categorized by :
Spark YARN

Comments

|
June 26, 2014 at 9:24 am
|

Spark 1.0 or 0.9? Great news either way.

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Join the Webinar!

Discover HDP 2.2: Learn What’s New in YARN: Reliability, Scheduling and Isolation
Thursday, November 20, 2014
1:00 PM Eastern / 12:00 PM Central / 11:00 AM Mountain / 10:00 AM Pacific

More Webinars »

HDP 2.1 Webinar Series
Join us for a series of talks on some of the new enterprise functionality available in HDP 2.1 including data governance, security, operations and data access :
Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.
Explore Technology Partners
Hortonworks nurtures an extensive ecosystem of technology partners, from enterprise platform vendors to specialized solutions and systems integrators.