In February, we announced the Stinger Initiative, which outlined an approach to bring interactive SQL-query into Hadoop. Simply put, our choice was to double down on Hive to extend it so that it could address human-time use cases (i.e. queries in the 5-30 second range). So, with input and participation from the broader community we established a fairly audacious goal of 100X performance improvement and SQL compatibility.
As representatives of this open, community led effort we are very proud to announce the first release of the new and improved Apache Hive, version 0.11. This substantial release embodies the work of a wide group of people from Microsoft, Facebook , Yahoo, SAP and others. Together we have addressed 386 JIRA tickets, of which there were 28 new features and 276 bug fixes. There were FIFTY-FIVE developers involved in this and I would like to thank every one of them. See below for a full list.
As promised we have delivered phase 1 of the Stinger Initiative in late spring. This release is another proof point that that the open community can innovate at a rate unequaled by any proprietary vendor. As part of phase 1 we promised windowing, new data types, the optimized RC (ORC) file and base optimizations to the Hive Query engine and the community has delivered these key features.
Hadoop 2.0 and explicitly YARN turns Hadoop from a single application system to a multi-application operating system. The next generation of Apache Hive, built on YARN, becomes part of the platform itself and can be managed by YARN to ensure that multiple use cases can be addressed beyond interactive query. It is the delivery of a multi-application data system. In this new world, Hive is a first class citizen along with a variety of workloads within a cluster and resources can be managed more discreetly.
Ultimately, this leads to further performance enhancements for Hive and with the inclusion of Tez, we will be able to demonstrate even more significant improvements as service startup times are removed a newly optimized execution chain within core Hadoop is delivered. The near future is exciting!
This release represents significant enhancements to Hive that will improve direct SQL interaction with Hive and light up the hundreds of applications that already rely on Hive as the defacto SQL interface for Hadoop. If you are one of the hundreds of software companies using Hive already, we hope you test out this new release and are happy with the results. We look forward to supporting it in HDP 1.3 in the very near future. 😉
Thanks to 55 developers who contributed time and effort on this release: Alan Gates, Amareshwari Sriramadasu, Andrew Chalfant, Arup Malakar, Ashish Singh, Ashish Vaidya, Ashutosh Chauhan, Bennie Schut, Bhushan Mandhani, Billie Rinaldi, Brock Noland, Carl Steinbach, Chen Chun, Chris Drome, Dilip Joseph, Edward Capriolo, Gang Tim Liu, Gopal V, Gunther Hagleitner, Harish Butani, Ivan Gorbachev, Jarek Jarcec Cecho, Jean Xu, Jingwei Lu, Johnny Zhang, Jonathan Chang, Kevin Wilfong, Lars Francke, Li Yang, Mark Grover, Mayank Garg, Mikhail Bautin, Namit Jain, Navis, Nick Collins, Owen O’Malley, Pamela Vagata, Prajakta Kalmegh, Prasad Mujumdar, Roshan Naik, Sam Tunnicliffe, Samuel Yuan, Sean Busbey, Shreepadma Venugopalan, Sushanth Sowmyan, Teddy Choi, Thejas M Nair, Thiruvel Thirumoolan, Travis Crawford, Vikram Dixit K, Vinod Kumar Vavilapalli, Wonho Kim, Xiao Jiang, Zhenxiao Luo