Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
October 22, 2013
prev slideNext slide

Announcing Apache Pig 0.12…The Community Breeds a More Powerful Pig

Today we are proud to announce the general availability of Apache Pig 0.12!

pig12NotableIf you are a Pig user and you’ve been yearning to use additional languages, for more data validation tools, for more expressions, operators and data types, then read on. Version 0.12 includes all of those additions, and now Pig runs on Windows without Cygwin.

This was a great team effort over the past six months with over 30 engineers from Twitter, Yahoo, LinkedIn, Netflix, Microsoft, IBM, Salesforce, Mortardata, Cloudera and several others (including Hortonworks of course). Between Pig 0.11 and Pig 0.12, we resolved 305 Jira issues.

Improvements in Apache Pig 0.12

Assert operator

An assert operator can be used for data validation. For example, the following script will fail if any value is a negative integer:

a = load 'something' as (a0:int, a1:int);
assert a by a0 > 0, 'a cant be negative for reasons';

Streaming UDF

Users can now write a UDF using a language without JVM implementations. In particular, we implemented C Python UDF in this version. Users are able to write Python UDF using C Python extensions which otherwise are not possible in Jython.

Rewrite of AvroStorage

We completely revamped the AvroStorage. It is now part of Pig built-in functions. It uses the latest version of Avro and is significantly faster, with many bug fixes.

IN operator

Previously, Pig had no support for IN operators. To mimic those, users had to concatenate several OR operators, as in this example:

a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
b = FILTER a BY 
   (i == 1) OR
   (i == 22) OR
   (i == 333) OR
   (i == 4444) OR
   (i == 55555)

Now, this type of expression can be re-written in a more compact manner, using an IN operator:

a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
b = FILTER a BY i IN (1,22,333,4444,55555);

CASE expression

Before Pig had no support for a case statement. To mimic it, users often use nested bincond operators. Those could become unreadable when there were multiple levels of nesting.

Here’s an example of the type of CASE expression that Pig now supports:

  CASE i % 3 
     WHEN 0 THEN '3n' 
     WHEN 1 THEN '3n+1' 
     ELSE '3n+2' 

BigInteger/BigDecimal data types

Some applications require calculations with a high degree of precision. In these cases BigInteger and BigDecimal can be used for more precise calculations.

Support for Microsoft Windows™

Changes that enable Apache Pig to run on Windows without Cygwin have now been committed to the trunk.

Parquet Support

Pig now wraps ParquetLoader/ParquetStorer in built-in functions. Users are able to load/store Parquet data easily.




  • Is assert command supported in Pig (HDP 2.0) ? I get syntax error as below :

    2013-11-24 17:53:53,368 [main] ERROR – ERROR 1000: Error during parsing. Encountered ” “assert “” at line 2, column 1.
    Was expecting one of:

    “cat” …

  • Is assert supported in (HDP 2.0)? I get following error while using assert in pig :

    2013-11-24 17:53:53,368 [main] ERROR – ERROR 1000: Error during parsing. Encountered ” “assert “” at line 2, column 1.
    Was expecting one of:

    “cat” …

  • Can we have a loader from string:

    a = LOAD ‘(1,2), (2,3)’ USING StringStorage(‘,’) AS (i:int, j:int);

    you could then use this for unit tests or as helper relations to generate whatever else you need.

  • Leave a Reply

    Your email address will not be published. Required fields are marked *

    If you have specific technical questions, please post them in the Forums

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>