Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
November 04, 2014
prev slideNext slide

Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Last week Hortonworks presented the second of our eight Discover HDP 2.2 webinars. Alan Gates and Raj Bains discussed the Stinger.next initiative and new Apache Hive features for speed, scale and SQL that are included in Hortonworks Data Platform 2.2.

After an overview of HDP 2.2, Alan discussed what the Apache community accomplished with the original Stinger initiative and how that momentum continues in Stinger.next.

Alan and Raj then discussed details on three areas of innovation currently underway in the Apache Hive project:

  • For SQL – transaction with ACID semantics
  • For Speed – the cost based optimizer
  • For Scale – dynamic query optimization

Here is the complete recording of the webinar

Here is the presentation deck.

Attend our next Discover HDP 2.2 webinar this coming Thursday, November 6 at 10am Pacific Time: Apache Falcon for Hadoop Data Governance

We’re grateful to the many participants who joined this webinar and asked excellent questions. Here’s the complete Q & A from the webinar:

Question Answer
For INSERTS/UPDATES/DELETES in Hive, will I need to go through HCatalog? So far, those have not been integrated with HCatalog. We do plan to integrate that functionality with HCatalog, Apache Pig and other components in a later phase.
Are INSERTS/UPDATES/DELETES supported in the Hortonworks Sandbox? Those work in the current version of Hortonworks Sandbox, but they require some additional setup. The next version of Sandbox will include better support for those functions, with examples so you can walk through them.
Is there a future for HIVE-HBASE integration (for example, HQL becoming HBase scans with no MapReduce at all)?

Yes. The Hive community is starting to work on that.

This blog talks about those plans in more detail: HBase and Hive – Better Together

Will Apache Spark be completely integrated with R? The Spark community would need to drive that forward and Hortonworks plans to support that.
What impact will the Hive innovation have on Tez or MapReduce?

We plan continued investment to develop Apache Tez as the best execution engine for applications like Pig and Hive.

In particular, we will be working to make sure Tez works well with the LLAP work we plan to do in Hive as part of Stinger.next.

While we plan continued support for MapReduce, we do not foresee extensive new development happening there.

What would be the benefit of moving machine learning to Hive when we have Mahout? Would machine learning in Hive be different? The goal isn’t to do machine learning in Hive, since there are better tools for that, such as Apache Mahout and Apache Spark. The goal is actually to make sure that Hive integrates well with those tools.
Is the cost based optimizer on Apache Tez only, or will it also be available with MapReduce? The CBO does not depend on the execution engine. However, we do most of the testing with Tez, so there may be cases where it will benefit Tez more than it does MapReduce.

Visit these pages to learn more

:

Tags:

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>