String split and concat

to create new topics or reply. | New User Registration

This topic contains 3 replies, has 3 voices, and was last updated by  Carlo Wayne 11 months, 2 weeks ago.

  • Creator
    Topic
  • #41103

    Dan Sadler
    Member

    Hi,

    I am trying to remove a column from a string contined within a file. However the string could be any length of size.
    The file can contain input such as….
    test.a.b.c.f.e.r.g.h,ttt,gggg,hhhh,ffff
    Fred.lll.ooo.ppp.1ss.d,j,h,g
    Gary.j.l,g,h,

    What I am trying to do is remove the first column within the fullstops.

    This is what I am currently doing:
    LoadData = load ‘ReadData.csv’ using PigStorage(‘,’); — Splits the data using commas
    f = foreach LoadData generate $0 as editData;
    splt = foreach f generate FLATTEN(STRSPLIT(sub_domain, ‘\\.’));
    –What i am trying to do now is put values $1 to end in a ‘variable’
    D = FOREACH splt GENERATE $1 .. as name;

    –so then i can do a replace on the whole string
    B =foreach D generate REPLACE(name, ‘,’, ‘.’);

    If anybody can advise any thing please let me know

Viewing 3 replies - 1 through 3 (of 3 total)

You must be to reply to this topic. | Create Account

  • Author
    Replies
  • #54131

    Carlo Wayne
    Participant

    Post has been outdated, but seems the output for this strings still invalid and never updated yet, as I am also trying it but similar issues I have just like Dan.. Any idea if they already releases the update for fixes? Thanks

    regards,
    Carlo @ eatmywords.com

    Collapse
    #41300

    Dan Sadler
    Member

    Hi Jianyong,
    The script will run, but does not give the desired output
    I would like to be able to access name. Essentially flatten it here D = FOREACH splt GENERATE $1 .. name ; so values 1 – end are accessible, i don’t want the first value.
    I would then like to do a replace using B =foreach D generate REPLACE(name, ‘,’, ‘.’); from the previous code.
    What i am stuck with is removing the first column of a split and then concatinating ALL the values together. .

    Thanks

    Collapse
    #41169

    Jianyong Dai
    Participant

    Hi, Dan,
    Does the script works for you? Are you asking a simpler way to do that?

    Collapse
Viewing 3 replies - 1 through 3 (of 3 total)
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.