Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
August 20, 2014
prev slideNext slide

Oxdata H2O Tutorial on Hortonworks Sandbox

The key to monetization of Big Data is not only the ability to capture and process information quickly but also to analyze the data to derive meaningful insights.  Big Data can be complex, and often the expertise of a programmer is needed to create focused and targeted queries.

0xdata, a provider of open source machine learning and predictive analytics for Big Data, helps to facilitate the manipulation and extraction of data with the use of its H2O prediction engine for statisticians.  The H2O tool helps reduce the need for programming and coding to model the data and supports the complete end-to-end analytical workflow.

Oxdata is Certified on the Hortonworks Data Platform (HDP) and is YARN Ready.

0xdata is a Hortonworks Technology Partner and recently certified on HDP 2.1 and is YARN Ready: H2O can be deployed directly on YARN. H2O is part of a Modern Data Architecture and can query data on existing databases as well as Hadoop.


H2O Tutorial on the Hortonworks Sandbox

Both Hortonworks and H2O are open source, and the integrated solution is easy to deploy and use. To demonstrate this ease of use, 0xdata has created a tutorial and video Predictive Analytics on H2O and Hortonworks Sandbox, which will help you streamline your initial setup of H2O on Hortonworks Sandbox.

For those new to the Sandbox, the Hortonworks Sandbox is a personal, portal Hadoop environment that comes with many tutorials. Hortworks’ partners  have also created tutorials demonstrating their applications and connectivity to the Sandbox. Visit the tutorial section to view all the tutorials available from partners and the community.

H2O and Hadoop

H2O is a statistical analysis engine that uses Hadoop Distributed File System (HDFS) as its storage platform and provides a user-friendly interface for easy querying. Users interact with H2O via a graphical interface that uses standard R statistical analysis syntax while running machine learning algorithms behind the scenes.  Because of its in-memory distributed key value store, H2O can process data faster and at a larger scale than other predictive analytics solutions. It can be deployed as standalone, on YARN or MapReduce.


Who is using H2O?

Users of H2O and 0xdata include Netflix, Rushcard, Trulia, and Vendavo for machine learning on their big datasets. The applications vary from financial services, fraud detection, to help sales people sell more profitably.  It’s all about fast, interactive, real-time predictive analytics.

For more information:



Sai Chaithanya Pallaprolu says:

Hello I am working on Horton works and I want XGboost to be in parallel (bigdata).
I am surprised if I can call H2o xgboost in pyspark .
Please help me .


Leave a Reply

Your email address will not be published. Required fields are marked *