Twitter Analytics Presents Hadoop and Pig at UC Berkeley

Twitter Analytics presented their distributed infrastructure, including Hadoop and Pig, at a UC Berkeley iSchool special course called INFO 290: Analyzing Big Data with Twitter. Twitter is a major contributor to many Apache projects. The course was over-subscribed and was a great success, as students got to learn from practicing data scientists using Hadoop on truly massive datasets. The entire lecture series is available here.

Bill Graham @billgraham, a Data Systems Engineer at Twitter Analytics and Apache Pig committer, presented an Introduction to Hadoop. His slides are available here. His presentation gives a comprehensive introduction to Apache Hadoop including its history, motivation, practice and operation.

Jonathan Coveney @jco, a Data Systems Engineer at Twitter Analytics and Apache Pig committer, presented Pig at Twitter. Slides for this presentation are available here. His presentation gives a comprehensive explanation of Apache Pig‘s philosophy, use and intricacies. It is one of the most thorough introductions to Pig I’ve seen and will serve as excellent documentation for beginners and intermediate Pig users alike.

Hats off to Twitter for their contribution to Apache open source and education. More Pig talks and papers are available on the Pig Confluence here.

Categorized by :
Hadoop Hadoop Ecosystem Industry Happenings Pig

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Upcoming Webinars!

Operationalize your Data Lake with Consistent Data Governance: Hortonworks Technical Workshop
Thursday, July 2, 2015
1:00 PM Eastern / 12:00 PM Central / 10:00 AM Pacific

More Webinars »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.