Last week Hortonworks presented the second of our eight Discover HDP 2.2 webinars. Alan Gates and Raj Bains discussed the Stinger.next initiative and new Apache Hive features for speed, scale and SQL that are included in Hortonworks Data Platform 2.2.
After an overview of HDP 2.2, Alan discussed what the Apache community accomplished with the original Stinger initiative and how that momentum continues in Stinger.next.
Alan and Raj then discussed details on three areas of innovation currently underway in the Apache Hive project:
Here is the complete recording of the webinar
Here is the presentation deck.
Attend our next Discover HDP 2.2 webinar this coming Thursday, November 6 at 10am Pacific Time: Apache Falcon for Hadoop Data Governance
We’re grateful to the many participants who joined this webinar and asked excellent questions. Here’s the complete Q & A from the webinar:
|For INSERTS/UPDATES/DELETES in Hive, will I need to go through HCatalog?||So far, those have not been integrated with HCatalog. We do plan to integrate that functionality with HCatalog, Apache Pig and other components in a later phase.|
|Are INSERTS/UPDATES/DELETES supported in the Hortonworks Sandbox?||Those work in the current version of Hortonworks Sandbox, but they require some additional setup. The next version of Sandbox will include better support for those functions, with examples so you can walk through them.|
|Is there a future for HIVE-HBASE integration (for example, HQL becoming HBase scans with no MapReduce at all)?||
Yes. The Hive community is starting to work on that.
This blog talks about those plans in more detail: HBase and Hive – Better Together
|Will Apache Spark be completely integrated with R?||The Spark community would need to drive that forward and Hortonworks plans to support that.|
|What impact will the Hive innovation have on Tez or MapReduce?||
We plan continued investment to develop Apache Tez as the best execution engine for applications like Pig and Hive.
In particular, we will be working to make sure Tez works well with the LLAP work we plan to do in Hive as part of Stinger.next.
While we plan continued support for MapReduce, we do not foresee extensive new development happening there.
|What would be the benefit of moving machine learning to Hive when we have Mahout? Would machine learning in Hive be different?||The goal isn’t to do machine learning in Hive, since there are better tools for that, such as Apache Mahout and Apache Spark. The goal is actually to make sure that Hive integrates well with those tools.|
|Is the cost based optimizer on Apache Tez only, or will it also be available with MapReduce?||The CBO does not depend on the execution engine. However, we do most of the testing with Tez, so there may be cases where it will benefit Tez more than it does MapReduce.|