Hortonworks is unleashing the power of Apache™ Spark for enterprise scale, unifying the capabilities of open enterprise Apache Hadoop® and the in-memory analytic capabilities of Apache Spark to maximize organizational value.
Spark is Better as Part of the Platform
Spark is certified as YARN-ready and is part of Hortonworks Data Platform. Memory and CPU-intensive Spark-based applications can coexist with other workloads deployed in a YARN-enabled cluster. Spark has first class support for external data sources, it can run directly on the cluster in YARN, and that is where enterprises want to perform their data analysis. This approach avoids the need to create and manage dedicated Spark clusters and allows for more efficient resource use within a single cluster.
Spark Requires Enterprise-Grade Security and Governance
As part of the HDP platform, Spark has access to the same governance, security and management policies as other components of the HDP stack. Spark is one the fastest moving projects in the Big Data ecosystem and its libraries remain at different levels of maturity. Hortonworks investigate, validate, certify and then support each of the components in the Spark project. This approach is key to the way we add value for our customers.
Notebooks Makes Spark and Data Science Easier to Consume & Share
Web-based notebooks bring data ingestion, exploration, visualization, sharing and collaboration capabilities to Hadoop and Spark. Hortonworks is making a substantial investment in Apache Zeppelin and we plan to make Zeppelin ready for production use by adding security, stability, R support and easy to use.
By delivering a unified Apache Spark and Hadoop, we combine Spark-driven Agile Analytic workflows with the vast-data set and economics of Hadoop. With Hortonworks, enterprises can deploy Apache Spark with the industry’s best security, governance, and operations capabilities.
WHAT IS HORTONWORKS' FOCUS ON SPARK?
With the release of Spark 1.6, Hortonworks commits to helping customers accelerate data science, maintain seamless data access, drive innovation at the core.
Spark as part of open enterprise Hadoop, empowers organizations to scale Spark, for enterprise value.
Data Science Acceleration
Improving data science productivity by enhancing Apache Zeppelin and by contributing additional Spark algorithms and packages to ease the development of key solutions.
For example: Project Magellan - Geospatial analytics in Apache Spark, an open source library for geospatial analytics that facilitates geospatial queries and builds upon Spark to solve hard problems dealing with geospatial data at scale.
Seamless Data Access
Spark SQL provides a SQL and Data Frame APIs to access structured data while Spark Streaming enables developers to easily build scalable, high-throughput, fault-tolerant stream processing of live data streams.
Hortonworks has been improving Spark’s integration with YARN, HDFS, Hive, HBase and ORC. Specifically, we believe that we can further optimize data access via the new Data Source API.
Innovate at the Core
Enable RDD sharing with the HDFS Memory Tier
Contribute additional machine learning algorithms
Enhance Spark’s enterprise security, governance, operations, and readiness
To learn more about all the exciting Spark innovation,
Listen to our recent webinar - Spark at Scale with Hadoop
Magellan: Geospatial Analytics on Spark
Geospatial data is pervasive—in mobile devices, sensors, logs, and wearables. This data’s spatial context is an important variable in many predictive analytics applications. To benefit from spatial context in a predictive analytics application, we need to be able to parse geospatial datasets at scale, join them with target datasets that contain point in space information,…
A completely open web-based notebook that enables interactive data analytics Apache Zeppelin is a new and incubating multi-purposed web-based notebook which brings data ingestion, data exploration, visualization, sharing and collaboration features to Hadoop and Spark.
Chris Eidler, VP Solutions R&D - CDI, Hewlett Packard Enterprise Hewlett Packard Enterprise and Hortonworks have a strategic partnership to help organizations realize their modern data architecture. As industry leaders we are able to offer customers scalable, secure and easily deployable solutions for Apache Hadoop to solve the most challenging data storage and processing requirements…
Using Spark in the Cloud with Hortonworks & Microsoft
Today organizations produce more data than ever and are continuously looking for solutions that allow them to gain deep insight into their business and monetize the data collected from multiple sources. Apache Spark helps you improve your business insights by providing a highly-scalable and interactive environment for analyzing data. Microsoft has worked with Hortonworks to…
Built-in Cloud Security for Big Data Workloads – Live Demo & Flex-Support Options
Today enterprises are moving their data lakes to the cloud to help them execute faster, increase productivity, and drive innovation while leveraging the scale and flexibility of the cloud. However, such gains come with robust Authentication, Authorization and Audit (“AAA”) requirements needed for these workloads. In this interactive webinar: Learn how to get consistent security…
Hortonworks Data Cloud provides a quick and easy on-ramp for users looking to combine the agility of Amazon Web Services ("AWS") with the data processing power of the Hortonworks Data Platform. We are pleased to announce not ONE but TWO new releases available for Hortonworks Data Cloud. Read on to learn more. Hortonworks Data Cloud…
Run Apache Spark 2.1 & Apache Zeppelin in Hortonworks Data Cloud
Apache Spark 2.1 Improves in Structured Streaming and Machine Learning. Structured Streaming: Kafka .10 support, Metrics & Stability improvements Machine Learning: SparkR Improvements including new ML algorithms for LDA, Random forests, GMM, etc. The recent release of Hortonworks Data Platform 2.6 (“HDP 2.6”) includes Apache Spark 2.1. And Hortonworks Data Cloud (“HDCloud”) for AWS gives…
What’s New for Apache Spark & Apache Zeppelin in HDP 2.6?
The value of any data is proportional to the insights derived from it. With the Data Lake Architecture, all of the enterprise data is made available in one place. The key to driving insights from the Data Lake is Apache Spark & Apache Zeppelin. Both are key tools to drive Predictive Analytics and Machine Learning.…
Time is running out to secure your spot at DataWorks Summit/Hadoop Summit. With over 170 sessions featuring top organization using open source technologies to leverage their data, drive predictive analytics, distributed deep-learning, and artificial intelligence initiatives, you don’t want to miss the industry’s premier event. Join us June 13 – 15 in San Jose and save…
Introducing Row/ Column Level Access Control for Apache Spark
The latest version of Hortonworks Data Platform (HDP) introduced a number of significant enhancements for our customers. For instance, HDP 2.6.0 now supports both Apache Spark™ 2.1 and Apache Hive™ 2.1 (LLAP™) as GA. Often customers store their data in Hive and analyze that data using both Hive and SparkSQL. An important requirement in this scenario…
Applied Healthcare Informatics: A Healthcare Data Ecosystem Constructed on HDP and Utilizing HDF
This is a guest blog post by Charles Boicey, Chief Innovation Officer at Clearsense. Clearsense was born out of a passion for helping healthcare organizations realize the promise of their data and its ability to help them make better, faster clinical decisions—to meet the challenges of value-based care, drive research, improve patient care, and ultimately…
We are thrilled to announce that Hortonworks Data Platform (HDP) version 2.6 is now available - both on pre-premise and in the cloud. For the first time, we are also making this available on IBM Power System in addition to the x86 chipset. During 2016, we have seen many of Hortonworks’ customers deploy more and…
Top 6 Reasons to Use Apache Hadoop, Apache Spark and Apache Hive in the Cloud | Hortonworks
We at Hortonworks have spent countless hours working with customers as they use Apache Hadoop, Spark and Hive in the cloud, to help them better leverage the cloud platforms they use for these data processing workloads. In the interest of community and sharing, wanted to share some of the “top reasons” we’ve heard. Enjoy! Cloud…
Enterprise Data Warehouse — Past, Present and Future
Syncsort and Hortonworks working together to drive the success of a modern EDW solution Enterprise Data Warehouse has become a standard component of the corporate data architecture. In the past 15 years, a variety of product offerings were introduced into the market on building EDWs, operational data stores, real-time Data Warehouses. The differences is the…
Apache Spark 2.0 was released yesterday in the community. This is a long awaited release that delivers several key features. We are really excited about this release and sincerely thank the Apache Software Foundation and Apache Spark communities for making this release possible. The most notable improvements in this release are in the areas of API,…
Apache, Hadoop, Falcon, Atlas, Tez, Sqoop, Flume, Kafka, Pig, Hive, HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper, Oozie, Phoenix, NiFi, HAWQ, Zeppelin, Atlas, Slider, Mahout, MapReduce, HDFS, YARN, Metron and the Hadoop elephant and Apache project logos are either registered trademarks or trademarks of the Apache Software Foundation in the United States or other countries.