Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button

Hortonworks Accelerates Spark at Scale for the Enterprise

SANTA CLARA, Calif., — Dec. 9, 2015 — Hortonworks, Inc. (NASDAQ: HDP) today announced coming advancements of Hortonworks Data Platform (HDP™) with the in-memory analytic capabilities of Spark. Apache Spark 1.5.2 will include support for Spark SQL and Spark Streaming. Hortonworks’ commitment to Spark is focused on helping customers accelerate data science, maintain seamless data access, drive innovation at the core and ultimately scale for the enterprise.

“We continue to see customers across all industries derive real value from using Spark with Hortonworks Data Platform,” said Tim Hall, vice president of product management at Hortonworks. “Our customers rely on us to guide them on their Spark journey, and our ability to scale Spark against massive data-sets is remarkable. With the inclusion of Spark 1.5.2 on HDP, customers can get new Spark capabilities and maximize its value for the enterprise.”

“Webtrends is working with Hortonworks to take Spark, Hive and Hadoop and execute these jobs in parallel,” said Peter Crossley, director of architecture, Webtrends. “This capability is critical because it allows us to combine the power of Big Data with the speed and flexibility of an ad hoc system. This means marketers will be able to ask any question of their unlimited data, no matter how structured it may be.”

Accelerating Apache Spark for Enterprise Scale

Hortonworks is providing customers the easiest path for adopting Spark with Hadoop and allowing for innovation at scale. Customers can deploy modern, Spark-based applications alongside Hadoop workloads in a consistent, predictable and reliable way. In order to meet the requirements of enterprise customers, Hortonworks’ three main areas of focus for Spark include:

Data Science Acceleration

  • Improving data science productivity by enhancing Apache Zeppelin, currently available as a technical preview, and by contributing additional Spark algorithms and packages to ease the development of key solutions. One example is Project Magellan, an open source library for geospatial analytics that facilitates geospatial queries and builds upon Spark to solve hard problems dealing with geospatial data at scale.

Seamless Data Access

  • Hortonworks is improving Spark’s integration with YARN, HDFS, Hive, HBase and ORC because customers are running Spark on YARN in combination and in conjunction with many of the other popular data access engines. Specifically, Hortonworks is working to further optimize data access via the new Data Source API. This will allow Spark SQL users to take full advantage of the following capabilities:
  • ORC File instantiation as a table
  • Column pruning
  • Language integrated queries
  • Predicate pushdown

Innovation at the Core

  • Enhancing Spark’s enterprise security, governance, operations and overall readiness for real-world production deployment.

Fostering Community Innovation

Hortonworks has launched Hortonworks Community Connection (HCC), a new online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub. This new online community is an extension of Hortonworks’ open source roots and underscores its commitment to engaging with the community, and fostering community innovation and knowledge. More information about Project Magellan can be found within HCC.

Hortonworks Community Connection currently hosts thousands of technical articles and FAQs on Hadoop, Spark and other big data technologies contributed by Hortonworks engineers and other technical experts. Registration for HCC is now open to the public. Join the HCC community at:


For more about Hadoop and Spark, visit Additional information about HDP and Spark can be found here

About Hortonworks

Hortonworks is the leader in accelerating business transformations with Open Enterprise Hadoop by developing, distributing and supporting an enterprise-scale data platform built entirely on open source technology including Apache™ Hadoop®. Our team comprises the largest contingent of builders and architects within the Hadoop ecosystem who represent and lead the broader enterprise requirements within these communities.

The Hortonworks Data Platform provides an open platform that deeply integrates with existing IT investments and upon which enterprises can build and deploy Hadoop-based applications.

Hortonworks has deep relationships with the key strategic data center partners that enable our customers to unlock the broadest opportunities from Hadoop.

For more information, visit Join us at the Apache Hadoop 10 year anniversary party, held at Hadoop Summit Europe and North America in 2016.

Hortonworks, HDP and HDF are registered trademarks or trademarks of Hortonworks, Inc. and its subsidiaries in the United States and other jurisdictions.


For more information:

Michelle Lazzar

(408) 884-9861