Author: Vinay Shukla, Hortonworks, Huzefa Hakim, IBM
On June 13th, 2017, Hortonworks and IBM announced the extension of our partnership. A key part of this partnership is the collaboration on IBM’s Data Science Experience. This collaboration is win-win in that it brings a production-ready full cycle data science experience to HDP customers and provides DSX customers access to information stored within HDP data lakes with an enterprise grade compute grid. For most businesses, data is a key competitive differentiator. Increasingly, to fully leverage this data, Data Science is employed. Data Science remains a complex undertaking. Data scientists are asked to excel in multiple complex disciplines from data engineering, statistics to the business domain. This challenge is substantial enough on small data, at the scale of Big-Data, it becomes really difficult.
At Hortonworks we are super excited about IBM’s Data Science Experience (DSX) which supports the complete Data Science Lifecycle. DSX helps data scientists bring their familiar tools such as Jupyter, RStudio, wrangle data, create complex machine learning models and deploy these models to production.
From Small Data, Small Learning To Big Data Big Learning
A lot of Machine Learning models work well with down sampling and with small data. But increasingly a large class of problems needs big-data for better predictions. Deep Learning, for example, is more effective with big-data. The combination of Big-Data and Big Compute, provided by Big Data platforms such as HDP, with Data Science Experience will unleash big learning and make data science more accessible, scalable and leverage all the enterprise data to make more accurate predictions.
Easier Data Science on Big Data
An increasing number of Hortonworks customers are moving to data science. Our customers leverage HDP to deliver Machine Learning use cases such as Churn Prediction, Predictive Maintenance to Optimizing Product Placement and Store Layout.
Until now there was no unifying tool for complete Data Science Lifecycle. Data Science on Big data meant struggle with data movement, Kerberos, feature engineering using a plethora of tools including notebook setup, ad-hoc collaboration with no standard tool to deploy machine learning to production. Data science is such a fast moving field, a lot of struggle with it is to just keeping up with latest advances.
DSX fully addresses the entire Data Science lifecycle. DSX provides a choice of notebooks, collaboration, tutorials and deploying Machine Learning to production for Spark, R, Python and other ML languages.
Our customers will now be able to leverage the compute provided by YARN to make a more accurate prediction with entire data stored in enterprise Data Lake.
DSX already includes RStudio and Jupyter as notebooks. IBM and Hortonworks are working together to include Apache Zeppelin within DSX. This will offer more choices to Data Scientists.
Apache Zeppelin will continue to be a part of HDP and we plan to increase our investments in Zeppelin and Data Science to offer more robust and feature rich platform to current customers that are already using Zeppelin with HDP and prospects that are considering to use this combination.
How to Get Started
To get started, talk to your Hortonworks or IBM account teams about DSX and experience it first hand. In the near future, we plan to improve DSX’s integration with HDP and its related security and governance capabilities provided by Apache Atlas and Ranger. These capabilities will be delivered as part of the first technical preview of DSX on HDP. This technical preview will provide an avenue for customers and prospects to evaluate DSX on HDP in a non-production environment. In the future, we will offer support for DSX on a production HDP environment.
The golden age of Data Science is coming. DSX now HDP offers industry’s leading platform for Data Science on Big Data. We want to thank our Customers and IBM for making this possible. We are very excited about the future and look forward to working with our customers and IBM to deliver the best Data Science Experience on Big Data.
Please visit the following page to learn more about how DSX and HDP: