Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button

Data Science @ Scale

HDP and IBM Data Science Experience

Data Science is a Game Changer for Businesses

Data Science is an interdisciplinary field that combines machine learning, statistics, advanced analysis, and programming. It is a new form of art that draws out hidden insights and puts data to work in the cognitive era.

IBM Data Science Experience (DSX) is an enterprise platform for data scientists and data engineers. It offers out-of-the-box open-source and commercial data science tools including RStudio, Apache Spark, Jupyter, and Zeppelin notebooks. DSX supports the entire data science lifecycle from data preparation and ETL to model development and deployment. With DSX, companies can build predictive and machine learning models using their favorite tools, technologies, and libraries, while leveraging the scale, security and governance of the HDP platform.

manufacturing video imgvideo button

Data Science Lifecycle


Access to community

DSX provides a social environment where data scientists can research and share articles, data sets, notebooks, and tutorials. DSX enables data scientists and analysts to come up to speed by taking courses in R, Python, or Scala, copy content into a Jupyter or a Zeppelin notebook, or work in an embedded RStudio environment.

  • Find tutorials and datasets
  • Connect with data scientists and ask questions
  • Research articles and papers
  • Fork and share projects
Blog: Certification of IBM Data Science Experience (DSX) on HDP is a Win-Win for Customers
Use familiar open source tools and libraries

With DSX, data scientists have the flexibility to create new Jupyter or Zeppelin notebooks in R, Python, or Scala or import an existing notebook. DSX includes popular open source libraries, such as PySpark, matplotlib, SparkML and machine learning and deep learning APIs. Data scientists can use DSX to tell a compelling story with the help of open source visualization libraries like Brunel and PixieDust and have the flexibility to install other open source libraries of their choice.

  • Code in Scala, Python, R, Apache Spark and SQL
  • Visualize and share code using Zeppelin & Jupyter Notebooks
  • Leverage RStudio IDE and Shiny
  • Use your favorite libraries including Scikit-learn, XGBoost, Spark Mlib, TensorFlow, Caffe, Keras and MXNet
Webinar: From Data Science to Enterprise Data Science @ Scale
Operationalize models with one click

With DSX, administrators can deploy models with one-click and have the ability to monitor all runtime environments and services.

  • Data Shaping Pipeline UI
  • Auto-data preparation & modeling
  • Advanced Visualizations
  • Model management & deployment
  • Documented Model APIs
Solution Brief: Data Science Machine Learning
Scale and enterprise security

The combination of HDP and DSX empowers enterprises to run data science at scale by leveraging all the data in the data lake, as well as deploying enterprise-grade security, governance, and operations.

  • Data Science at Scale - Run Spark Jobs on HDP Cluster
  • Secure Hadoop Support using Apache Ranger
  • Support for ABAC using Apache Ranger
Blog: An Exciting Data Science Experience on HDP