cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button

The Hortonworks Blog

More from Daniel Dai

The Apache community released Apache Pig 0.15.0 last week. Although there are many new features in Apache Pig 0.15.0, we would like to highlight two major improvements: Pig on Tez enhancements Using Hive UDFs inside Pig Below are some details about these important features. For the complete list of features, improvements, and bug fixes, please […]

With YARN as its architectural center, Apache Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it simultaneously in different ways. Apache Tez supports YARN-based, high performance batch and interactive data processing applications in Hadoop that need […]

The Apache Pig community released Pig 0.13. earlier this month. Pig uses a simple scripting language to perform complex transformations on data stored in Apache Hadoop. The Pig community has been working diligently to prepare Pig to take advantage of the DAG processing capabilities in Apache Tez. We also improved usability and performance. This blog […]

Today we are proud to announce the general availability of Apache Pig 0.12! If you are a Pig user and you’ve been yearning to use additional languages, for more data validation tools, for more expressions, operators and data types, then read on. Version 0.12 includes all of those additions, and now Pig runs on Windows […]

We are pleased to announce that Apache Pig 0.10.1 was recently released. This is primarily a maintenance release focused on stability and bug fixes. In fact, Pig 0.10.1 includes 42 new JIRA fixes since the Pig 0.10.0 release. Some of the notable changes include: Source code-only distribution In the download section for Pig 10.0.1, you […]

Another important milestone for Apache Pig was reached this week with the release of Pig 0.10. The purpose of this blog is to summarize the new features in Pig 0.10. Boolean Data Type Pig 0.10 introduces boolean data type as a first-class Pig data type. Users can use the keyword “boolean” anywhere where a data type […]

I ran across an interesting problem in my attempt to implement random forest using Apache Pig. In random forest, each tree is trained using a bootstrap sample. That is, sample N cases at random out of a dataset of size N, with replacement. For example, here is the input data: (0, 1, 2, 3, 4, […]

* Special note: the code discussed in this blog is available here * A common complain of Pig is the lack of control flow statements: if/else, while loop, for loop, etc. And now Pig has a response for it: Pig embedding. You can now write a python program and embed Pig scripts inside of it, […]

This is the first of three blogs that will highlight the new features in Pig 0.9. When I first started to use Pig, the one thing that I hated the most was that I needed to write 4 lines of code to get a simple count: A = load ‘student.txt’ as (name, student, gpa); B […]