cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
cta

Get Started today with Hortonworks Sandbox

cloud Ready to Get Started?

DOWNLOAD SANDBOX

Hortonworks Sandbox is a personal, portable Apache Hadoop® and its ecosystem environment that comes with dozens of interactive tutorials and the most exciting developments from the Apache community. Get up and running in 15 minutes!

Download Sandbox

Overview

If you are new to the Hortonworks Sandbox and using Apache open source tools to build modern data applications we suggest you are with the following tutorials.

Sandbox Basics

Getting Started with HDP®

Begin your Apache Hadoop journey with this tutorial aimed for users with limited experience in using the Sandbox.
Explore Sandbox on virtual machine and cloud environments and learn to navigate the Apache Ambari user interface.

This tutorial provides a section that describes the key concepts and series of tutorials where you move data into HDFS, explore the data with SQL in Apache Hive, do transformations with Apache Pig or Apache Spark and at the end generate a report with Apache Zeppelin.

Getting Started with HDP

Hands-on Tour of Apache Spark in 5 minutes

This will provide a quick introduction to Spark by creating an RDD for wikipedia inside an Apache Zeppelin notebook.

After you have gone through this tutorial you can find additional Spark tutorials here:

Apache Spark in 5 minutes

IoT Realtime Event Processing

Apache Hadoop is often used to process unstructured data, new data types or data at scale at rest. However, you can also process data-in-motion and this tutorial will introduce you to tools like Apache Nifi, Apache Kafka, Apache Storm and Apache HBase.

IoT Realtime Event Processing

Learning the ropes of Apache NiFi

NiFi provides the data acquisition, simple event processing, transport and delivery mechanism designed to accommodate the diverse dataflows generated by a world of connected people, systems, and things. In this tutorial you will be introduced to how Apache NiFi connects and conducts streaming transportation data.

Apache NiFi

Try More Tutorials

You can find additional tutorials here:

What's New in Hortonworks Data Platform 2.5

parallax slide

For Data Workers

  • Explore the latest APIsHortonworks new distribution strategy delivers the rapid innovations from the Apache™ Hadoop® community to you. HDP now supports multiple versions of Apache Hive (1.2 & 2.1) and Apache Spark (1.6 & 2.0) in the same cluster.

  • Interactive SQL SpeedInteractive query with Apache Hive LLAP. LLAP enables sub-second SQL analytics on Hadoop by intelligently caching data in memory with persistent servers that instantly process SQL queries.

  • Remote access to Apache PhoenixApache Phoenix now ships a new Query Server which allows greater access and choice of development languages to access data stored within Apache HBase.

parallax slide

For Hadoop operators

  • Advanced Visualization DashboardingAmbari 2.4 provides integrated log search and access capabilities. This enables operators to search, browse and filter their cluster operational logs for easier management. Also the integration of Grafana with Ambari brings the most important metrics front-and-center.

  • Integration of Comprehensive Security and Trusted GovernanceThe Apache Ranger and Apache Atlas integration allows enterprises to implement dynamic classification-based security policies. Using Ranger, administrators can define security policies based on Atlas metadata tags or attributes and apply this policy in real-time.

  • Streamlined Operations for Apache HBaseStreamlined backup and restore capabilities have been added to Apache HBase allowing operators to perform incremental backups. HBase operations have been simplified with improved HBase metrics in Ambari and set of pre-built dashboards.

parallax slide

For Data Scientists

  • Simplifies DevelopmentApache Zeppelin provides a secure and collaborative web-based notebook for interactive data ingestion, exploration, and visualization for Apache Spark, Apache Hive and Apache Phoenix.

  • Seamless Data AccessImproved Apache Spark access to Apache Hive and Apache HBase. The Spark-HBase connector leverages Data Source API (SPARK-3247) introduced in Spark-1.2.0.

  • Apache Spark 2.0The most notable improvements in Apache Spark 2.0 are in the areas of API, Performance, Structured Streaming and SparkR. Achieve higher performance through a new Dataset API which is an extension of DataFrame API and also supports compile-time type checking.

Hortonworks Sandbox in the cloud

Hortonworks Sandbox in the cloud

Explore cloud vendors that can help you get started with Hadoop with minimum system requirements.
Learn More
Hortonworks Sandbox on a VM Download

No data center, no cloud service and no internet connection needed! Full control of the environment. Easily extend with additional components or try the various Hortonworks technical previews. Always updated with latest edition.

Try Hortonworks Sandbox on Azure

Azure provides an easy way to get started with Hadoop with minimum system requirements. This is a great solution if your personal machine doesn’t meet the minimum system requirements to run locally.