While just a preview by moniker, the release marks a significant milestone in the transformation of Hadoop from a batch-oriented system to a data platform capable of interactive data processing at scale and delivering on the aims of the Stinger Initiative.
Tez is a low-level runtime engine not aimed directly at data analysts or data scientists. Frameworks need to be built on top of Tez to expose it to a broad audience… enter SQL and interactive query in Hadoop.
Stinger Phase 3 Preview combines the Tez execution engine with Apache Hive, Hadoop’s native SQL engine. Now, anyone who uses SQL tools in Hadoop can enjoy truly interactive data query and analysis.
We have already seen Apache Pig move to adopt Tez, and we will soon see others like Cascading do the same, unlocking many forms of interactive data processing natively in Hadoop. Tez is the technology that takes Hadoop beyond batch and into interactive, and we’re excited to see it available in a way that is easy to use and accessible to any SQL user.
The major improvements found in this release include:
We wanted to make it easier for people to get a feel for the difference Tez makes, so we’ve created an Open Hive Testbench project in Github with the same test suite we use internally to test Hive on Tez. The testbench includes a data generator and 50 sample queries derived from the TPC-DS benchmark. The Testbench’s data generator lets you generate any amount of data you want, from gigabytes to terabytes, so you can get a feel for Tez at small scales or large.
Want to tell us your experience or discuss a problem? Join the Hortonworks Hive Forum and tell us about your experience with Hive and Tez. Stay up to date with continued progress on Stinger at our Labs page.