Announcing Apache Hive 0.12: Stinger Phase Two… DELIVERED
Stinger is not a product. Stinger is a broad community based initiative to bring interactive query at petabyte scale to Hadoop. And today, as representatives of this open, community led effort we are very proud to announce delivery of Apache Hive 0.12, which represents the critical second phase of this project!
Only five months in the making, Apache Hive 0.12 comprises over 420 closed JIRA tickets contributed by ten companies, with nearly 150 thousand lines of code! This work is perfectly representative of our approach… it is a substantial release with major contributions from a wide group of talented engineers from Microsoft, Facebook , Yahoo and others.
Delivery of SQL-IN-Hadoop Marches
The Stinger Initiative was announced in February and as promised, we have seen consistent regular delivery of new features and improvements as outlined in the Stinger plan. There are three roadmap vectors for Stinger: Speed, Scale and SQL. Each phase of the initiative advances on all three goals and this release provides a significant increase in SQL semantics, adding the VARCHAR and DATE datatypes and improving performance ORDER by and GROUP by. Several features to optimize queries have also been added.
We also contributed numerous “under the hood” improvements, ie refactoring code and making it easier to build on top of hive – getting rid of some of the technical debt. This helps us deliver further optimizations in the long term, especially for the upcoming Apache Tez integration.
A complete list of the notable improvements included in the release is listed here and expect an updated performance benchmark soon!
If you check out the release notes be prepared to scroll for quite sometime as it extends over 420 JIRA tickets. A lot of people have ben involved and as you can see from the chart below, Hive is wildly active community. It counts the number of emails sent by month to the Hive developer mailing list. The momentum is building and the community is definitely engaged.
Many people need to be thanked, most of them listed here: Alan Gates,Aleksey Gorshkov, Anandha L Ranganathan, Arup Malakar, Ashutosh Chauhan, Azrael, Bing Li, Brock Noland, Caofangkun, Chaoyu Tang, Chris Drome, Chu Tong, Daniel Dai, Deepesh Khandelwal, Dheeraj Kumar Singh, Dilip Joseph, Edward Capriolo, Eli Reisman, Eugene Koifman, Gabriel Reid, Gopal V, Gunther Hagleitner, Guo Hongjie, Hari Sankar Sivarama Subramaniyan, Harish Butani, Ido Hadanny, Ivan A. Veselovsky, Jarek Jarcec Cecho, Jason Dere, Johnny Zhang, Jon Hartlaub, Kevin Wilfong, Laljo John Pullokkaran, Lefty Leverenz, Mark Grover, Mark Wagner, Matthew Weaver, Mikhail Bautin, Morgan Phillips, Namit Jain, Navis, Owen O’Malley, Prasad Mujumdar, Prasanth J, Rob Weltman, Robert Roland, Roshan Naik, Samuel Yuan, Sarvesh Sakalanaga, Sean Busbey, Sergey Shelukhin, Shreepadma Venugopalan, Shuaishuai Nie, Sushanth Sowmyan, Swarnim Kulkarni, Teddy Choi, Thejas M Nair, Vikram Dixit K, Xiu, Xuefu Zhang and Yin Huai.
Try it with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.