Get Started with Hadoop

Kick start your journey in Hadoop with these resources

Ready to Work with Hadoop?

We’ve brought together a collection of resources that are of particular interest for developers, analysts, and system administrators

Learn how to collect and process data and build applications with Hadoop.
Learn how to explore, query and deliver insights with Hadoop.
Learn how to provision, manage and monitor Hadoop.

The easiest way to get started with Enterprise Hadoop

Sandbox is a personal, portable Hadoop environment that comes with a dozen interactive Hadoop tutorials. Sandbox includes many of the most exciting developments from the latest HDP distribution, packaged up in a virtual environment that you can get up and running in 15 minutes!

Learn Hadoop
Sandbox comes with a dozen hands-on tutorials that will guide you through the basics of Hadoop; tutorials built on the experience gained from training thousands of people in our Hortonworks University Training classes.

Build a Proof of Concept
The Sandbox includes the Hortonworks Data Platform in an easy to use form. You can add your own datasets, and connect it to your existing tools and applications. With this, you can prove out your use of Hadoop and plan the integration points for your first Hadoop project.

Test New Functionality
You can test new functionality with the Sandbox before you put it into production. Simply, easily and safely.

Work your way through the tutorials

Sandbox comes with a series of in-depth tutorials that provide an easy hands-on introduction to many of the common use cases of Hadoop

  1. Hello World – An overview of Hadoop with HCatalog, Hive and Pig
  2. How To Process Data with Apache Pig
  3. How to Process Data with Apache Hive
  4. How to Use HCatalog, Pig & Hive Commands
  5. More…

Training & Certification

Hortonworks offers public and private Hadoop training for business users, Java developers, Windows teams, data analysts, data scientists and administrators. Courses are designed by the leaders and committers of Apache Hadoop and students work through real-world, scenario-based projects.
See all courses

Hortonworks University “Self-Paced” Learning Library is an on-demand learning library that can be accessed with a Hortonworks University account. Learners can view lessons anywhere, at any time, and complete lessons at their own pace.

Students that successfully complete a Hortonworks training course are able to sit for the respective Hortonworks certification exam. Hortonworks certification identifies you as an expert in the Apache Hadoop ecosystem.

Contribute

Apache Hadoop has a vibrant community of contributors developing and extending the codebase, along with a developers, data scientists and adminstrators building apps on top of Hadoop. Join the Apache mailing lists for Hadoop and monitor progress of JIRA tickets, submit bugs and contribute code.

An ideal way to get started. Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.

Have Questions?

Connect with the Hortonworks experts and other Hadoop users in the Forums.

Find a Meetup near you

There are many Hadoop user groups across the world focused on learning, using and evolving Hadoop. Meet them here.

Sign up for the newsletter

If you drop us your email address below, then we’ll drop you a line every few weeks with the latest information from Hortonworks and the Hadoop ecosystem.

Upcoming Webinars!

Operationalize your Data Lake with Consistent Data Governance: Hortonworks Technical Workshop
Thursday, July 2, 2015
1:00 PM Eastern / 12:00 PM Central / 10:00 AM Pacific

More Webinars »

Comprehensively Secure your Big Data Environment – with Hortonworks and Vormetric
Thursday, July 16, 2015
1:00 PM Eastern / 10:00 AM Pacific

More Webinars »

Try these Tutorials

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.