Recap of the August Pig Hackathon at Hortonworks

The August Pig Hackathon brought Pig users from Hortonworks, Yahoo, Cloudera, Visa, Kaiser Permanente, and LinkedIn to Hortonworks HQ in Sunnyvale, CA to talk and work on Apache Pig.

hackers hacking away at the august 2012 pig hackathon at Hortonworks in Sunnyvale, CA

Jonathan Coveney and Bill Graham from Twitter walked newer Pig users through how Pig translates a Pig Latin script to map reduce jobs and went over how to read the output of explain. For those interested, Hortonworks founder Alan Gates covers this in Chapter 1 of Programming Pig.

Thejas Nair walked through how to contribute patches to Pig and how to work with committers to get the patches in. You can learn more about this on the Pig Wiki.

The group talked through the proposal for a new EvalFunc interface that would make it much easier to write UDFs or User Defined Functions for Pig. Part of what makes Pig so powerful is its extensibility, and making that even easier would make Pig a better tool. A discussion in JIRA ticket PIG-2421 is availble if you want to collaborate on improving Pig’s eval funcs.

Alan Gates presented some thoughts on building a generic DAG (directed acyclic graph) execution and optimization engine that could be used by Pig and Hive and that would take advantage of new features in Hadoop 2.0. This would reduce duplication between the projects as well as allow users to share UDFs between them. We covered using Pig and Hive together and via HCatalog in previous posts.

You don’t have to be a Pig expert to attend a Pig meetup – all levels of proficiency are invited. Committers love to meet new users that appreciate their work. One attendant said, “There were many pig commiters at the meetup. The Twitter and HortonWorks people were very helpful.”

To find out about more Pig meetups, join the Pig User group on meetup. We can’t wait to see you there!

Categorized by :
Hadoop Ecosystem Pig

Comments

|
August 30, 2012 at 12:03 pm
|

Would love to attend once I graduate :)

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.