Get Started


Ready to Get Started?

Download sandbox

How can we help you?

closeClose button

The Hortonworks Blog

More from Sam Shah

If Pig is the “duct tape for big data“, then DataFu is the WD-40. Or something. No, seriously, DataFu is a collection of Pig UDFs for data analysis on Hadoop. DataFu includes routines for common statistics tasks (e.g., median, variance), PageRank, set operations, and bag operations. It’s helpful to understand the history of the library. […]