The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

Hortonworks Sandbox Forum

Hortonworks Sandbox Tutorials

  • #14352

    I did the first 3 tutorials and they were easy to follow, although a bit rudimentary. Any suggestions of other Hadoop recipes or learning activity references to try in the sandbox? I guess some of the other Hortonworks developer training resources would be useful if they were downloadable.

    I cheated on some of the tutorials — I copied and pasted code fragments into the Sandbox panels from the tutorial pane rather than typing them manually. Even less typing required.


  • Author
  • #14365
    Sasha J

    try to look through the “Hadoop: The Definitive Guide” book.
    It have a plenty of samples and exercises.

    Thank you!



    I am familiar with that book but it kind of starts in the wrong place — doesn’t get to pig and hive until the middle chapters. What I was looking for were additional data sets and some “challenges” to motivate using pig and hive inside the sandbox to derive value from the data.

    In the Definitive guide, there is mention of data from: National Climatic Data Center (NCDC, but it is not clear to me how to get a subset of this data, using the file browser tab of the sandbox, into the system and then working with it on small problems to gain more awareness of how to use pig and hive for example.

    I will poke around the internets to see if something of a smaller scale might be possible for use in the sandbox.

    Cheryle Custer

    Hi James,

    Thank you for downloading and using the Sandbox.

    We will be releasing new tutorials on a regular basis. I’d be interested in hearing about the kinds of tutorials and content that you would be interested in seeing.

    Related to your question of getting subsets of data into the Sandbox, there isn’t really a limit on size — only constrained by your laptop and connection speed. Since this is a single node environment large data sets may take time to process.

    If you are looking for other interesting data sets. Please visit or Pew Research: They have free data sets that you can access.


    I found this little tutorial here: that provides a little word count pig example. The data is small but relevant to the example and word counting is another common Hello World Hadoop example. I was able to use the sandbox to execute this.

    I have been learning R recently and seen some webinar/reference material on R and Hadoop. Would be great if this could somehow be brought under the sandbox umbrella as well. I have seen Cloudera referenced here but not Hortonworks/HDP.

    Thanks for the Pew and Infochimps referenced. Perhaps someone can suggest some analysis exercises based on data from these sets.

    Adding HBase into the mix as well would be desirable.

    Sasha J

    You can check on the following links, it have a bunch of samples:

    You can google for “hadoop tutorial” and get many links for it.
    Thank you!

    Nancy Jean

    Hi, What is the CPU spcification to install Sandbox. Is it 64bit or 32bit.

    Nancy Jean

    Hi, What is the CPU specification to install Sandbox. Is it 64bit or 32bit.


    Based on the other threads here in the forums, I think you want 64 bit, both for the CPU and the OS.

The forum ‘Hortonworks Sandbox’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.