Pig Forum

How to pivor values?

  • #40934
    Pawel Kowalski
    Participant

    Hi,
    Is there any biuld-in ot UDF function that can do something like pivor?
    Example:
    I have :
    (a,b,c), (a,b,d), (a,b,e),(k,l,m), (k,l,n)
    Id like to have:
    (a,b,(c,d,e)), (k,l,(m,n))

    Any idea?
    Regards,
    Pawel

to create new topics or reply. | New User Registration

  • Author
    Replies
  • #40967
    Jianyong Dai
    Moderator

    Pig does not have such a build-in UDF. You will need to implement one.

    #41320
    abdelrahman
    Moderator

    Hi Pawel,

    Do you mean pivot function?

    Here is a SO post that handles this question:

    http://stackoverflow.com/questions/11213567/pivot-table-with-apache-pig/13752135#13752135

    Thanks
    -Rahman

    #42573
    Pawel Kowalski
    Participant

    I thing I’ve found what I wanted.

    Let’s say that file ‘pivot_data.txt’ look like this:
    1|R1C1|R1C2|R1C3
    1|R2C1|R2C2|R2C3
    1|R3C1|R3C2|R3C3

    Where kolumn 1 is a common ID for all rows.

    array = LOAD ‘pivot_data.txt’ using PigStorage(‘|’)
    AS (ID1 : chararray,Col1 : chararray,Col2 : chararray, Col3 : chararray);
    B = foreach array generate ID1,Col1;
    C = GROUP B by ID1;
    D = foreach C generate $0, $1.$1;
    STORE D into ‘result.txt';
    — now we load file as pure text
    E = LOAD ‘result.txt’ using TextLoader() as (row:chararray);
    — this is the result od dump E
    — (1 {(R1C1),(R2C1),(R3C1)})
    — now I need exact format onf a new sting
    F = foreach E generate
    REPLACE(
    REPLACE(
    REPLACE(
    REPLACE(
    REPLACE($0,’\\(|\\)’,”),
    ‘\\{‘,’\\(‘),
    ‘\\}’,’\\)’),
    ‘,’,’.’),
    ‘\t’,’,’);
    — and I receive:
    –(1,(R1C1.R2C1.R3C1))

    and this is what i needed….

    Hope it will save somebody lots of time…
    Pawel

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.