Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
September 10, 2014
prev slideNext slide

Hadoop Summit Curated Content: Apache Hive

Speed, Scale, and SQL Semantics

Since its inception and graduation as a Top Level Project (TPL) from Apache Foundation Project (ASF) in September 2010, Apache Hive has been steadily improving—in speed, scale, and SQL semantics—to meet enterprise requirements for both interactive and batch queries at Hadoop scale.

It has become a defacto standard for SQL queries over petabytes of data stored in Hadoop. It is a compliant SQL engine that offers familiarity to developers over a comprehensive and familiar set of SQL semantics for Apache Hadoop. And it continues to evolve, with the Hive community’s concerted commitment to innovate for the enterprise.

Hive Strives as SQL-in-Hadoop

The first Stinger initiative delivered Hive 0.13 with substantial speed and SQL interactive capabilities. Last week we announced project that not only promises the sub-second speed, scale, and SQL 2011 analytics. It also drives Apache Hive’s future to meet enterprises’ data analytics needs and strives to deliver Enterprise SQL-in-Hadoop at petabyte scale.

Hive on Tez

As part of the data access layer in the enterprise’s blueprint, Hive 0.13 sits well and runs on Apache Tez and Apache Hadoop YARN, the architectural center, providing timely batch and interactive SQL query access to petabytes of data stored in HDFS.


Apache Hive Curated Content

At the Hadoop Summit San Jose 2014, a number of Apache Hive contributors, committers, customers, and practitioners shared their deep technical knowledge, best practices, potential and promise of Apache Hive.

There is more invaluable content here, but below we have selected a few notable sessions that speak to the speed and scale of Apache Hive for interactive and batch queries:

Session Title Watch View
Adding ACID Transactions, Inserts, Updates, and Deletes in Apache Hive Video Slides
Making Hive Suitable for Analytics Workloads Slides
Hive on Apache Tez: Benchmarked at Yahoo! Scale Video Slides
Hive + Tez: A Performance Deep Dive Video Slides
De-Bugging Hive with Hadoop-in-the-Cloud Video Slides
Cost-based query optimization in Hive Video Slides
A Perfect Hive Query For A Perfect Meeting Video
Hivemall: Scalable Machine Learning Library for Apache Hive Video Slides


Many will peruse the curated content above for Hive 0.13 and realize its promise of speed and scale. Many will follow Apache Hive’s progressive evolution through its phased-initiative that will achieve Enterprise SQL at Hadoop scale. And many will conclude that one SQL engine, delivered through, is better than two.

In short, many will “want one SQL engine, one tool, not two.

Learn and Discover


Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums