Home Forums Pig Left inner join in pig

This topic contains 1 reply, has 2 voices, and was last updated by  Alan Gates 10 months, 3 weeks ago.

  • Creator
    Topic
  • #40558

    Dan Sadler
    Member

    Hi,
    I have two files I am loading into Pig. File A contains detials on Students and File B the list of students who are in detention.
    I would like to get the list of Students that are in File A but not in File B. Using a filter works well for names but I have over 200+ names in file B.
    A left inner join would work but I cannot seem to find any information and don’t think Pig supports this.
    Any help to solve this would be great!
    Thanks

Viewing 1 replies (of 1 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #40572

    Alan Gates
    Participant

    I’m not familiar with “left inner” joins. But a anti-join will do what you want. Maybe they’re the same thing.

    A = load ‘file A’;
    B = load ‘file B’;
    C = cogroup A on studentname, B on studentname;
    D = filter C by COUNT(B) == 0;
    E = foreach D generate flatten(A);

    This will do an anti-join. It cogroups everything in A and B on the same key, filters out any groups where there have records in B, and then flattens the grouping of A so that you again have individual records.

    Collapse
Viewing 1 replies (of 1 total)