Today we are proud to announce the general availability of Apache Pig 0.12!
If you are a Pig user and you’ve been yearning to use additional languages, for more data validation tools, for more expressions, operators and data types, then read on. Version 0.12 includes all of those additions, and now Pig runs on Windows without Cygwin.
This was a great team effort over the past six months with over 30 engineers from Twitter, Yahoo, LinkedIn, Netflix, Microsoft, IBM, Salesforce, Mortardata, Cloudera and several others (including Hortonworks of course). Between Pig 0.11 and Pig 0.12, we resolved 305 Jira issues.
Improvements in Apache Pig 0.12
An assert operator can be used for data validation. For example, the following script will fail if any value is a negative integer:
a = load 'something' as (a0:int, a1:int); assert a by a0 > 0, 'a cant be negative for reasons';
Users can now write a UDF using a language without JVM implementations. In particular, we implemented C Python UDF in this version. Users are able to write Python UDF using C Python extensions which otherwise are not possible in Jython.
Rewrite of AvroStorage
We completely revamped the AvroStorage. It is now part of Pig built-in functions. It uses the latest version of Avro and is significantly faster, with many bug fixes.
Previously, Pig had no support for IN operators. To mimic those, users had to concatenate several OR operators, as in this example:
a = LOAD '1.txt' USING PigStorage(',') AS (i:int); b = FILTER a BY (i == 1) OR (i == 22) OR (i == 333) OR (i == 4444) OR (i == 55555)
Now, this type of expression can be re-written in a more compact manner, using an IN operator:
a = LOAD '1.txt' USING PigStorage(',') AS (i:int); b = FILTER a BY i IN (1,22,333,4444,55555);
Before Pig had no support for a case statement. To mimic it, users often use nested bincond operators. Those could become unreadable when there were multiple levels of nesting.
Here’s an example of the type of CASE expression that Pig now supports:
bar = FOREACH foo GENERATE ( CASE i % 3 WHEN 0 THEN '3n' WHEN 1 THEN '3n+1' ELSE '3n+2' END );
BigInteger/BigDecimal data types
Some applications require calculations with a high degree of precision. In these cases BigInteger and BigDecimal can be used for more precise calculations.
Support for Microsoft Windows™
Changes that enable Apache Pig to run on Windows without Cygwin have now been committed to the trunk.
Pig now wraps ParquetLoader/ParquetStorer in built-in functions. Users are able to load/store Parquet data easily.