Announcing Stinger Phase 3 Technical Preview

Open Innovation: Going Beyond Batch in Hadoop

As an early Christmas present, we’ve made a technical preview of Stinger Phase 3 available.  While just a preview by moniker, the release marks a significant milestone in the transformation of Hadoop from a batch-oriented system to a data platform capable of interactive data processing at scale and delivering on the aims of the Stinger Initiative.

Apache Tez and SQL: Interactive Query-IN-Hadoop

stinger-phase-3Tez is a low-level runtime engine not aimed directly at data analysts or data scientists. Frameworks need to be built on top of Tez to expose it to a broad audience… enter SQL and interactive query in Hadoop.

Stinger Phase 3 Preview combines the Tez execution engine with Apache Hive, Hadoop’s native SQL engine. Now, anyone who uses SQL tools in Hadoop can enjoy truly interactive data query and analysis.

We have already seen Apache Pig move to adopt Tez, and we will soon see others like Cascading do the same, unlocking many forms of interactive data processing natively in Hadoop. Tez is the technology that takes Hadoop beyond batch and into interactive, and we’re excited to see it available in a way that is easy to use and accessible to any SQL user.

Stinger Phase 3 Preview major improvements

The major improvements found in this release include:

  • Choose either the Tez execution engine for interactive SQL in Hadoop or the proven Map/Reduce framework for batch SQL processing.
  • Substantial improvements to the Vectorized Query Engine, developed in collaboration between Microsoft and Hortonworks, which increases SQL processing by an order-of-magnitude or more.
  • A sneak peak at expanded Hive SQL coverage including subqueries for IN / NOT IN and HAVING clauses.
  • Plus more than 500 other improvements covering both Hive and Tez.

An Open Hive Testbench

We wanted to make it easier for people to get a feel for the difference Tez makes, so we’ve created an Open Hive Testbench project in Github with the same test suite we use internally to test Hive on Tez. The testbench includes a data generator and 50 sample queries derived from the TPC-DS benchmark. The Testbench’s data generator lets you generate any amount of data you want, from gigabytes to terabytes, so you can get a feel for Tez at small scales or large.

Try The Preview In 3 Easy Steps

  • Step 1: Get an HDP 2 cluster.
    • Option 1: If you want to truly experience the power of Tez, we suggest a larger dataset (200GB or more) on a cluster of at least 4 physical nodes.
    • Option 2: If simple is what you need, install an HDP 2 Sandbox and try it out there. This option doesn’t give you large scale but does let you experience interactive query in Hadoop. If you go this route, increase the memory size of your Sandbox to 4GB or more.
  • Step 2: Download the Stinger Phase 3 Preview package, which includes Tez 0.2 and a new version of Hive designed to work with Tez.
  • Step 3: Follow the installation instructions to deploy the Preview to your cluster. Once you’re done, give Hive a try through CLI, beeline or through HiveServer2. 

Please Provide Your Feedback

Want to tell us your experience or discuss a problem? Join the Hortonworks Hive Forum and tell us about your experience with Hive and Tez. Stay up to date with continued progress on Stinger at our Labs page.

Categorized by :
Administrator Apache Hadoop Architect & CIO Data Analyst & Scientist Developer HDP 2 Hive Stinger Tez

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Try it with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.

Get Sandbox

Recently in the Blog

Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.
Integrate with existing systems
Hortonworks maintains and works with an extensive partner ecosystem from broad enterprise platform vendors to specialized solutions and systems integrators.

Thank you for subscribing!