Ready to Get Started?DOWNLOAD SANDBOX
Hortonworks Sandbox is a personal, portable Apache Hadoop® environment that comes with dozens of interactive Hadoop and it's ecosystem tutorials and the most exciting developments from the latest HDP distribution. Get up and running in 15 minutes!Download Sandbox
If you are new to the Hortonworks Sandbox and using Apache open source tools to build modern data applications we suggest you are with the following tutorials.
Begin your Apache Hadoop journey with this tutorial aimed for users with limited experience in using the Sandbox.
Explore Sandbox on virtual machine and cloud environments and learn to navigate the Apache Ambari user interface.
This tutorial provides a section that describes the key concepts and series of tutorials where you move data into HDFS, explore the data with SQL in Apache Hive, do transformations with Apache Pig or Apache Spark and at the end generate a report with your choice of Microsoft Excel, Apache Zeppelin or Zoomdata tools.
This will provide a quick introduction to Spark by creating an RDD for wikipedia inside an Apache Zeppelin notebook.
After you have gone through this tutorial you can find additional Spark tutorials here:
Apache Hadoop is often used to process unstructured data, new data types or data at scale at rest. However, you can also process data-in-motion and this tutorial will introduce you to tools like Apache Storm, Apache Kafka and Apache HBase.
This tutorial will introduce you to consuming real time twitter data and doing some basic sentiment analysis. You will be introduced to Apache NiFi to connect and conduct streaming data from twitter and then you will persist the data into Apache Solr and Apache Hive.
You can find additional tutorials here:
Speed up Spark StreamingExperience performance gains up to 10 times for applications that store large datasets such as state management, through a revamped Spark Streaming state tracking API.
Seamless Data AccessAchieve higher performance with Spark 1.6 through a new Dataset API which is an extension of DataFrame API and also supports compile-time type checking.
Dynamic Executor Allocation Utilize cluster resources efficiently through Dynamic Executor Allocation functionality that automatically expands and shrinks resources based on utilization.
More Flexible UpgradesAmbari 2.2 provides Hadoop operators a faster way to upgrade their clusters by automating both maintenance and feature releases, while the cluster is down.
Simplified Security OperationsService configurations for Ranger provides a continuation of the new user experience. In addition, optional storage of Kerberos credentials and customizable security settings simplify administration and provide additional security.
Improved Troubleshooting Ambari 2.2 makes it easier and faster to perform troubleshooting with customizable metric widget graph display timezone and the ability to export metrics to identify and respond to problems quickly.
No data center, no cloud service and no internet connection needed! Full control of the environment. Easily extend with additional components or try the various Hortonworks technical previews. Always updated with latest edition.
Azure provides an easy way to get started with Hadoop with minimum system requirements. This is a great solution if your personal machine doesn’t meet the minimum system requirements to run locally.