Posts by Julien Ledem:


Community Blog: Transitive Closure in Apache Pig

This was originally published on my blog; I’m re-posting it here on request from the fine people at Hortonworks.

1. Introduction

This a follow-up on my previous post about implementing PageRank in Pig using embedding. I also talked about this in a presentation to the Pig user group.

One of the best features of embedding is how it simplifies writing UDFs and using them right away in the same script without superfluous declarations. Computing a transitive closure is a good example of an algorithm requiring an iteration, a few simple UDFs and an end condition to decide when to stop iterating. Embedding is available in Pig 0.9. Knowledge of both Pig and Python is required to follow. Examples are available on github.

Read More

Community Blog: PageRank Implementation in Pig

In this post I’m going to give a very simple example of how to use Pig; embedded in Python to implement the PageRank; algorithm. It goes in a little more details on the same example given in the presentation I gave at the Pig user meetup. On the same topic, Daniel published a nice K-Means implementation using the same embedding feature. This was originally published on my blog; I’m re-posting it here on request from the fine people at Hortonworks.

1. What is Pig embedded

Pig 0.9 supports running Python scripts with Pig statements embedded. This is simply run using the pig command:

Read More