Speed, Scale and SQL: The Stinger Initiative, Apache Hive 12 & Apache Tez
I recently sat down with Owen O’Malley and Carter Shanklin to discuss the dramatic improvements delivered by the Stinger Initiative to version 0.12 of Apache Hive, which is well on its way to being 100x faster than pre-Stinger versions of Hive. That means interactive queries on petabytes of data.
Owen is one of the original architects of Apache Hadoop and Carter is the Hortonworks product manager focused on Hive. Together, they explain the speed, scale and SQL semantics delivered in Apache Hive v0.12, which is included in Hortonworks Data Platform v2.0. You can also find a technical preview of Hive 13 on our Labs page.
There’s also a little bit of Apache Hadoop YARN woven in.
- Basic definitions for Apache Hive, Apache Tez, the ORCFile format, predicate pushdown, vectorization and the Stinger Initiative
- Discussion of new features in Hive 12
- Addition of the VARCHAR and DATE data types
- Preview of Hive 13 and phase three of Stinger
Visit our Stinger Initiative labs page to learn more.
Try it with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
The Stinger Initiative is a broad, community-based effort to drive the future of Apache Hive, delivering 100x performance improvements at petabyte scale with familiar SQL semantics. More »