Pig Forum

How to create ordered values?

  • #40776
    Pawel Kowalski

    I have logs like this:
    14:22:09,835| ERROR | 81258617B015E6BE9KOONREB35886539.ec3www04 | au2 | 400
    14:22:09,844| DEBUG | 81258617B015E6BE9KOONREB35886539.ec3www04 | au1 | 0
    14:26:03,438| INFO | 81258617B015E6BE9C9383EB35889313.ec3www04 | au1 | 1000
    14:26:03,444| DEBUG | 5A3BB0439CB862F719866262A0868AC3.ec3www04 | au1 | 1000
    14:59:31,054| INFO | 81258617B015E6BE9C9383EB35889313.ec3www04 | au1 | 1090
    14:59:45,518| INFO | 81258617B015E6BE9C9383EB35889313.ec3www04 | au2 | 1100
    15:01:29,583| INFO | 81258617B015E6BE9C9383EB35889313.ec3www04 | au2 | 1701
    15:01:30,449| DEBUG | 5A3BB0439CB862F719866262A0868AC3.ec3www04 | au1 | 1010

    and fields are date, log_level, session_id, user_id, event_code.
    I need to group events by session_id, user_id and present list of event_code orderd by date.

    file = LOAD ‘logs.log’ using PigStorage(‘|’)
    AS (date : chararray, log_level : chararray, session_id : chararray, user_id : chararray, event_code : int);
    session = group events by (session_id, user_id);

    and what next? How do i sort it?

    Please help.

to create new topics or reply. | New User Registration

  • Author
  • #40778
    Jianyong Dai

    result = foreach session {ordered = order file by date; generate ordered;};

    Pawel Kowalski

    2013-10-17 11:18:31,071 [main] ERROR org.apache.pig.tools.grunt.Grunt – ERROR 1200: Pig script failed to parse:
    expression is not a project expression: (Name: ScalarExpression) Type: null Uid: null)
    Details at logfile: /hadoop/mapred/taskTracker/hue/jobcache/job_201310161143_0042/attempt_201310161143_0042_m_000000_0/work/pig_1382033897387.log

    and line 6 is {ordered = order file by date;

    any idea?

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.