Not a day passes without someone tweeting or re-tweeting a blog on the virtues of Apache Spark.
At a Memorial Day BBQ, an old friend proclaimed: “Spark is the new rub, just as Java was two decades ago. It’s a developers’ delight.”
Spark as a distributed data processing and computing platform offers much of what developers’ desire and delight—and much more. To the ETL application developer Spark offers expressive APIs for transforming data; to the data scientists it offers machine libraries, MLlib component; and to data analysts it offers SQL capabilities for inquiry.
In this blog, I summarize how you can get started, enjoy Spark’s delight, and commence on a quick journey to Learn, Try, and Do Spark on HDP, with a set of tutorials.
In the local mode—running on a single mode, on an HDP Sandbox— you can get started using a set of tutorials put together by my colleague Saptak Sen.
Our commitment to Apache Spark is to ensure it’s YARN-enabled and enterprise-ready with security, governance, and operations, allowing deep integration with Hadoop and other YARN enabled workloads in the enterprise—all running under the same Hadoop cluster, all accessing the same dataset.
We continue with that steadfast strategy. Last month, we released a technical preview of Apache Spark 1.3.1 on HDP 2.2. Shortly, we’ll follow with a 1.3.1 GA.