Hortonworks Sandbox

Version 2.1

The easiest way to get started with Enterprise Hadoop

Sandbox is a personal, portable Hadoop environment that comes with a dozen interactive Hadoop tutorials. Sandbox includes many of the most exciting developments from the latest HDP distribution, packaged up in a virtual environment that you can get up and running in 15 minutes!

Learn Hadoop
Sandbox comes with a dozen hands-on tutorials that will guide you through the basics of Hadoop; tutorials built on the experience gained from training thousands of people in our Hortonworks University Training classes.

Build a Proof of Concept
The Sandbox includes the Hortonworks Data Platform in an easy to use form. You can add your own datasets, and connect it to your existing tools and applications. With this, you can prove out your use of Hadoop and plan the integration points for your first Hadoop project.

Test New Functionality
You can test new functionality with the Sandbox before you put it into production. Simply, easily and safely.

What's New in Sandbox 2.1

  • Introducing Apache Tez for the fastest Hive ever!Apache Tez reimagines the original MapReduce for interactive query capabilities to meet the needs of users of the most widely-used data access engine for Hadoop: Apache Hive.
  • Vectorized QueryWith a deep engineering partnership and contributions from Microsoft then Apache Hive can take advantage of vectorized query execution and accelerate computations of data in memory up to 100x
  • Stream Processing with Apache StormApache Storm is a distributed real-time computation system for processing fast, large streams of data. Storm adds reliable real-time data processing capabilities to HDP 2.1. Storm in Hadoop helps capture new business opportunities with low-latency dashboards, security alerts, and operational enhancements
  • Data Governance with Apache Falcona framework for simplifying data management and pipeline processing in Apache Hadoop®. It enables users to automate the movement and processing of datasets for ingest, pipelines, disaster recovery and data retention use cases. Instead of hard-coding complex dataset and pipeline processing logic, users can now rely on Apache Falcon for these functions
  • Operations with Apache AmbariHDP 2.1 includes the very latest version of Apache Ambari and which now supports Apache Storm, Apache Falcon and Apache Tez, provides extensibility and rolling restarts, as well as other significant operational improvements.
  • Search with Apache SolrApache Solr introduces high performance indexing & sub-second search times over billions of documents. Apache Solr provides powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, management of rich documents (e.g., Word, PDF), and geospatial search

Technical Specifications

Component Version
Apache Hadoop 2.4.0
Apache Hive 0.13.0
Apache HBase 0.96.1
Apache Pig 0.12.1
Apache Storm 0.9.1
Apache Solr 4.8
Apache Falcon 0.5
Apache Sqoop 1.4.5
Apache Flume 1.4.0
Apache Oozie 4.0.0
Apache Ambari 1.5.1
Apache Mahout 0.9.0
Apache ZooKeeper 3.4.5
Apache Knox 0.4

For the list of patches applied to the component
versions please refer to the Release Notes.

Download & Install

Sandbox is provided as a self-contained virtual machine. No data center, no cloud service and no internet connection needed!

Installation Steps

  1. Install a virtualization environment (3 Options)
  2. Download & Import the respective Sandbox Image

Latest Releases of HDP Sandbox :

HDP 2.1 on Sandbox is available in the following variants :

HDP 2.1 Sandbox
on VMWare Fusion or Player

Additional Releases of HDP Sandbox :

HDP 2.2 Preview Sandbox New

Try out the latest features and functionality coming in HDP 2.2
Documentation: As above

HDP 1.3 Sandbox

For legacy testing. Some partner tutorials are also written for 1.3

System Requirements

  • Now runs on 32-bit and 64-bit OS (Windows XP, Windows 7, Windows 8 and Mac OSX)
  • Minimum 4GB RAM; 8Gb required to run Ambari and Hbase
  • Virtualization enabled on BIOS
  • Browser: Chrome 25+, IE 9+, Safari 6+ recommended. (Sandbox will not run on IE 10)

Known Issues:

The Hortonworks Sandbox is built on the Hortonworks Data Platform. However, excluded from this are:

  1. Third party tools and downloads (like Talend)
  2. Data sets uncompressed by Safari from .gz extension to .tsv extensions may not fully import. To solve this issue, using Safari on a Mac, please ensure that the following configuration is set in Preferences: General->uncheck "Open "safe" files after downloading".

Look here for Documentation on the Hortonworks Data Platform

Having Issues?

If you have issues with the download or use of the Sandbox, please visit the Hortonworks Sandbox Forum.