Recap of the August Pig Hackathon at Hortonworks

The August Pig Hackathon brought Pig users from Hortonworks, Yahoo, Cloudera, Visa, Kaiser Permanente, and LinkedIn to Hortonworks HQ in Sunnyvale, CA to talk and work on Apache Pig.

hackers hacking away at the august 2012 pig hackathon at Hortonworks in Sunnyvale, CA

Jonathan Coveney and Bill Graham from Twitter walked newer Pig users through how Pig translates a Pig Latin script to map reduce jobs and went over how to read the output of explain. For those interested, Hortonworks founder Alan Gates covers this in Chapter 1 of Programming Pig.

Thejas Nair walked through how to contribute patches to Pig and how to work with committers to get the patches in. You can learn more about this on the Pig Wiki.

The group talked through the proposal for a new EvalFunc interface that would make it much easier to write UDFs or User Defined Functions for Pig. Part of what makes Pig so powerful is its extensibility, and making that even easier would make Pig a better tool. A discussion in JIRA ticket PIG-2421 is availble if you want to collaborate on improving Pig’s eval funcs.

Alan Gates presented some thoughts on building a generic DAG (directed acyclic graph) execution and optimization engine that could be used by Pig and Hive and that would take advantage of new features in Hadoop 2.0. This would reduce duplication between the projects as well as allow users to share UDFs between them. We covered using Pig and Hive together and via HCatalog in previous posts.

You don’t have to be a Pig expert to attend a Pig meetup – all levels of proficiency are invited. Committers love to meet new users that appreciate their work. One attendant said, “There were many pig commiters at the meetup. The Twitter and HortonWorks people were very helpful.”

To find out about more Pig meetups, join the Pig User group on meetup. We can’t wait to see you there!

Categorized by :
Hadoop Ecosystem Pig

Comments

|
August 30, 2012 at 12:03 pm
|

Would love to attend once I graduate :)

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
HDP 2.1 Webinar Series
Join us for a series of talks on some of the new enterprise functionality available in HDP 2.1 including data governance, security, operations and data access :