Announcing HDP 2.1 Tech Preview Component: Apache Spark

Lighting Up Machine Learning & Data Science

Spark-logo-192x100pxHadoop 2 and its YARN-based architecture has increased the interest in new engines to be run on Hadoop and one such workload is in-memory computing for machine learning and data science use cases. Apache Spark has emerged as an attractive option for this type of processing and today, we announce availability of our HDP 2.1 Tech Preview Component of Apache Spark.  This is a key addition to the platform and brings another workload supported by YARN on HDP.

There has been a marked increase in interest among data scientists and enthusiasts for Apache Spark as they explore new ways to perform their very unique yet complex task in Hadoop. Our customers are investigating this technology as Spark allows key resources to effectively and simply implement iterative algorithms for advanced analytics such as clustering and classification of datasets.  It provides three key value points to developers,

  • in-memory compute for iterative workloads,
  • a simplified programming model in Scala,
  • and machine learning libraries to simply programming.

The Hortonworks Commitment

Hortonworks’ tech preview of Apache Spark is part of a larger initiative that will bring the best of heterogeneous, tiered storage, and resource-based models of computing together with the broader Hadoop community starting at the core of HDFS and working outward from there.

Our focus remains on delivering a fast, safe, scalable, and manageable platform on a consistent footprint that includes HDFS, YARN, Tez, Ambari, Knox, and Falcon to name just a few of the critical components of enterprise Hadoop.  We are working within this comprehensive set of components and hope to make Apache Spark “enterprise ready” so that our customers can confidently adopt it.

Availability

We have already completed baseline work to integrate Spark on YARN with HDP and want to make this available so that we work with customers to identify use cases which are best suited for this technology. Spark has the potential to meet the complex requirements of data science and machine learning use cases.  We encourage you to download our Tech Preview today and engage us so we can build out this key technology together.

Evaluate Spark on HDP Here

Categorized by :
Administrator Architect & CIO Data Analyst & Scientist Developer HDP 2.1 In-Memory Compute Spark

Comments

|
July 16, 2014 at 4:08 pm
|

Right now the Spark 0.9.1 based tech preview is out which provides step-by-step instructions to evaluate Spark http://hortonworks.com/wp-content/uploads/2014/05/SparkTechnicalPreview.pdf

Very shortly we will provide an update of this tech preview with latest version of Spark.

Abel Coronado
|
July 9, 2014 at 8:51 pm
|

Hi !!!

We want to deploy spark 1.0.0 in a HDP cluster, is it possible to have a step by step guide???

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.
Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.