In this post, Hortonworks Intern Jie Li talks about his work this summer on performance analysis and optimization of Apache Pig. Jie is a PhD candidate in the Department of Computer Science at Duke University. His research interests are in the area of database systems and big data computing. He is currently working with Associate Professor Shivnath Babu.
Pig Performance Analysis and Optimization
I am proud that I was among the first several interns at Hortonworks, one of the leaders in the Hadoop community. In this post, I want to summarize my project on Pig performance and also share my experience this summer.
I began working on Pig one year ago, when my classmates in CPS216 and I developed the TPC-H benchmark for Pig, in order to compare the performance of Pig and Hive. TPC-H (specified here) consists of a set of complex queries and is the well-known benchmark for the traditional data warehouse.…