Speed, Scale and SQL: The Stinger Initiative, Apache Hive 12 & Apache Tez

Community innovation delivering interactive query for Hadoop

I recently sat down with Owen O’Malley and Carter Shanklin to discuss the dramatic improvements delivered by the Stinger Initiative to version 0.12 of Apache Hive, which is well on its way to being 100x faster than pre-Stinger versions of Hive. That means interactive queries on petabytes of data.

Owen is one of the original architects of Apache Hadoop and Carter is the Hortonworks product manager focused on Hive. Together, they explain the speed, scale and SQL semantics delivered in Apache Hive v0.12, which is included in Hortonworks Data Platform v2.0. You can also find a technical preview of Hive 13 on our Labs page.

There’s also a little bit of Apache Hadoop YARN woven in.

Highlights include:

  • Basic definitions for Apache Hive, Apache Tez, the ORCFile format, predicate pushdown, vectorization and the Stinger Initiative
  • Discussion of new features in Hive 12
  • Addition of the VARCHAR and DATE data types
  • Preview of Hive 13 and phase three of Stinger

Visit our Stinger Initiative labs page to learn more.

Categorized by :
Architect & CIO Data Analyst & Scientist Developer HDP 2 Hive Performance Stinger Tez

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Try it with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.

Get Sandbox

Stinger Initiative

The Stinger Initiative is a broad, community-based effort to drive the future of Apache Hive, delivering 100x performance improvements at petabyte scale with familiar SQL semantics. More »

Join the Webinar!

YARN Ready – Office Hours
Thursday, September 11, 2014
1:00 PM Eastern / 10:00 AM Pacific

More Webinars »

Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.
Integrate with existing systems
Hortonworks maintains and works with an extensive partner ecosystem from broad enterprise platform vendors to specialized solutions and systems integrators.
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.