Hortonworks Sandbox Forum

Hortonworks Sandbox Tutorials

  • #14352

    I did the first 3 tutorials and they were easy to follow, although a bit rudimentary. Any suggestions of other Hadoop recipes or learning activity references to try in the sandbox? I guess some of the other Hortonworks developer training resources would be useful if they were downloadable.

    I cheated on some of the tutorials — I copied and pasted code fragments into the Sandbox panels from the tutorial pane rather than typing them manually. Even less typing required.

    Jim

to create new topics or reply. | New User Registration

  • Author
    Replies
  • #14365
    Sasha J
    Moderator

    James,
    try to look through the “Hadoop: The Definitive Guide” book.
    It have a plenty of samples and exercises.

    Thank you!

    Sasha

    #14378

    I am familiar with that book but it kind of starts in the wrong place — doesn’t get to pig and hive until the middle chapters. What I was looking for were additional data sets and some “challenges” to motivate using pig and hive inside the sandbox to derive value from the data.

    In the Definitive guide, there is mention of data from: National Climatic Data Center (NCDC, http://www.ncdc.noaa.gov/) but it is not clear to me how to get a subset of this data, using the file browser tab of the sandbox, into the system and then working with it on small problems to gain more awareness of how to use pig and hive for example.

    I will poke around the internets to see if something of a smaller scale might be possible for use in the sandbox.

    #14442
    Cheryle Custer
    Moderator

    Hi James,

    Thank you for downloading and using the Sandbox.

    We will be releasing new tutorials on a regular basis. I’d be interested in hearing about the kinds of tutorials and content that you would be interested in seeing.

    Related to your question of getting subsets of data into the Sandbox, there isn’t really a limit on size — only constrained by your laptop and connection speed. Since this is a single node environment large data sets may take time to process.

    If you are looking for other interesting data sets. Please visit InfoChimps.com or Pew Research: http://www.pewsocialtrends.org/category/datasets/ They have free data sets that you can access.

    #14505

    I found this little tutorial here: http://salsahpc.indiana.edu/ScienceCloud/pig_word_count_tutorial.htm that provides a little word count pig example. The data is small but relevant to the example and word counting is another common Hello World Hadoop example. I was able to use the sandbox to execute this.

    I have been learning R recently and seen some webinar/reference material on R and Hadoop. Would be great if this could somehow be brought under the sandbox umbrella as well. I have seen Cloudera referenced here but not Hortonworks/HDP.

    Thanks for the Pew and Infochimps referenced. Perhaps someone can suggest some analysis exercises based on data from these sets.

    Adding HBase into the mix as well would be desirable.

    #14513
    Sasha J
    Moderator

    Jim,
    You can check on the following links, it have a bunch of samples:

    http://developer.yahoo.com/hadoop/tutorial/
    http://hadoop.apache.org/docs/r0.20.2/mapred_tutorial.html

    You can google for “hadoop tutorial” and get many links for it.
    Thank you!
    Sasha

    #16062
    Nancy Jean
    Member

    Hi, What is the CPU spcification to install Sandbox. Is it 64bit or 32bit.

    #16066
    Nancy Jean
    Member

    Hi, What is the CPU specification to install Sandbox. Is it 64bit or 32bit.

    #16111

    Based on the other threads here in the forums, I think you want 64 bit, both for the CPU and the OS.

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.