In this guest blog, Oliver Chiu, Microsoft’s product marketing manager for Hadoop/Big Data and Data Warehousing, explains how customers can benefit from deploying Apache Spark and HDP on Azure HDInsight for their enterprise and mission-critical big data jobs.
On July 10, Microsoft announced the public preview availability of Apache Spark for Azure HDInsight.
Azure HDInsight is Microsoft’s managed Hadoop-as-a-service offering. It takes the Hortonworks Data Platform (HDP) and architects it for the cloud. Customers get the benefits of Big Data without needing to procure hardware, install/tune, or maintain their own Hadoop clusters. By bringing Apache Spark to Azure HDInsight, we make Spark more easily accessible with the same benefits. HDInsight eliminates much of the heavy lifting associated with deploying, managing and executing tasks on Spark, thus raising the bar on what it means to process big data in the cloud.
For customers, we have seen three specific scenarios that Spark has been able to change the game:
As more and more data is collected from a variety of sources, enterprises are anxious to get deep analytics about their business. With the release of Spark for HDInsight, analysts and BI professionals can analyze large unstructured data and build reports with their BI tool of choice or with open source notebooks (ie. Zeppelin or Jupyter).
Beyond batch and interactive queries, Spark is also ideal for building real-time solutions that can solve for challenges like fraud detection, click stream analysis, financial alerts, telemetry from connected sensors and devices (IoT) and others. Spark streaming APIs can be used to write complex algorithms expressed with streaming functions like join and window. This makes Spark unique in its ability to handle both batch/interactive queries and streaming functions using the same common execution model.
As part of Spark, customers will also have access to Spark MLib which is a scalable machine learning library consisting of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as underlying optimization primitives. This will allow customers to incorporate predictive analytic capabilities as part of their application. As customers want to build more machine learning solutions, Azure Machine Learning is also an ideal solution for its easy-to-use experience and its ability to deploy a ML model in minutes as a fully managed web service.
Spark as an open source project in the Apache ecosystem has been gaining in popularity with many different offerings that support it. Microsoft has worked with Hortonworks to make a big bet on Spark by providing users with the best experience by putting the end user first, by hardening Spark for your mission critical application and by making Spark easy to deploy.
Deploy Spark On-Premises with Hortonworks Data Platform