New Features in Apache Pig 0.10
Another important milestone for Apache Pig was reached this week with the release of Pig 0.10. The purpose of this blog is to summarize the new features in Pig 0.10.
Boolean Data Type
Pig 0.10 introduces boolean data type as a first-class Pig data type. Users can use the keyword “boolean” anywhere where a data type is expected, such as load-as clause, type cast clause, etc.
Here are some sample use cases:
a = load ‘input’ as (a0:boolean, a1:tuple(a10:boolean, a11:int), a2);
b = foreach a generate a0, a1, (boolean)a2;
c = group b by a2; — group by a boolean field
When loading boolean data using PigStorage, Pig expects the text “true” (ignore case) for a true value, and “false” (ignore case) for a false value; while other values map to null. When storing boolean data using PigStorage, true value will emit text “true” and false value will emit text “false”.…