Since its inception and graduation as a Top Level Project (TPL) from Apache Foundation Project (ASF) in September 2010, Apache Hive has been steadily improving—in speed, scale, and SQL semantics—to meet enterprise requirements for both interactive and batch queries at Hadoop scale.
It has become a defacto standard for SQL queries over petabytes of data stored in Hadoop. It is a compliant SQL engine that offers familiarity to developers over a comprehensive and familiar set of SQL semantics for Apache Hadoop. And it continues to evolve, with the Hive community’s concerted commitment to innovate for the enterprise.
The first Stinger initiative delivered Hive 0.13 with substantial speed and SQL interactive capabilities. Last week we announced Stinger.next project that not only promises the sub-second speed, scale, and SQL 2011 analytics. It also drives Apache Hive’s future to meet enterprises’ data analytics needs and strives to deliver Enterprise SQL-in-Hadoop at petabyte scale.
As part of the data access layer in the enterprise’s blueprint, Hive 0.13 sits well and runs on Apache Tez and Apache Hadoop YARN, the architectural center, providing timely batch and interactive SQL query access to petabytes of data stored in HDFS.
At the Hadoop Summit San Jose 2014, a number of Apache Hive contributors, committers, customers, and practitioners shared their deep technical knowledge, best practices, potential and promise of Apache Hive.
There is more invaluable content here, but below we have selected a few notable sessions that speak to the speed and scale of Apache Hive for interactive and batch queries:
|Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive||Video||Slides|
|Making Hive Suitable for Analytics Workloads||Slides|
|Hive on Apache Tez: Benchmarked at Yahoo! Scale||Video||Slides|
|Hive + Tez: A Performance Deep Dive||Video||Slides|
|De-Bugging Hive with Hadoop-in-the-Cloud||Video||Slides|
|Cost-based query optimization in Hive||Video||Slides|
|A Perfect Hive Query For A Perfect Meeting||Video|
|Hivemall: Scalable Machine Learning Library for Apache Hive||Video||Slides|
Many will peruse the curated content above for Hive 0.13 and realize its promise of speed and scale. Many will follow Apache Hive’s progressive evolution through its Stinger.next phased-initiative that will achieve Enterprise SQL at Hadoop scale. And many will conclude that one SQL engine, delivered through Stinger.next, is better than two.
In short, many will “want one SQL engine, one tool, not two.”