Tutorials for Hadoop with HDP 2.1: Hive, Tez, Falcon, Knox, Storm

Give HDP 2.1 a test drive with the Technical Preview VM

If you’re excited to get started with the new features in Hortonworks Data Platform 2.1, then we’ve included 4 tutorials for you try out – Sandbox-style.

You can download the HDP 2.1 Technical Preview here, and then get stuck into these great tutorials.

Interactive Query with Apache Hive and Apache Tez

OK, so you’re not going to get huge performance out of a one-node VM, but you can try out Hive on Tez, and see the performance gains versus MapReduce, and also try out features such as Vectorized Query, and the host of new SQL features. Get supercharged here.

Defining and Processing Data Pipelines with Apache Falcon

Sometimes, it’s not all about speed. Sometimes you want surety and governance on the data movements across the cluster. In this tutorial, we simulate a dataset movement from one cluster to another and perform cleansing as we do that. Define your pipeline here.

Processing Stream data in near real-time with Apache Storm

But then who am I kidding? Of course it’s all about speed. In this case, speed of response to incoming stream data. This tutorial sets up Apache Storm to read and react to incoming sentences. Process your streams here.

Secure your Hadoop infrastructure with Apache Knox

With data flying around in all directions, its probably worth taking a look at Apache Knox to provide perimeter security for your cluster – even if it is just one node. Batten down the hatches here.

We hope you have some fun testing out the new features of HDP 2.1 with these tutorials, and that they provide the inspiration for your own production work. If you have any comments, let us know below, or in the forums. And if you’d like a Hortonworks elephant, be sure to add your own tutorial over here.

 

Categorized by :
Administrator Architect & CIO Data Analyst & Scientist Developer Falcon HDP 2 Hive Knox Gateway Storm Tez

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Try it with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.

Get Sandbox

Join the Webinar!

YARN Ready – Office Hours
Thursday, September 11, 2014
1:00 PM Eastern / 10:00 AM Pacific

More Webinars »

Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.