Pig Forum

String split and concat

  • #41103
    Dan Sadler


    I am trying to remove a column from a string contined within a file. However the string could be any length of size.
    The file can contain input such as….

    What I am trying to do is remove the first column within the fullstops.

    This is what I am currently doing:
    LoadData = load ‘ReadData.csv’ using PigStorage(‘,’); — Splits the data using commas
    f = foreach LoadData generate $0 as editData;
    splt = foreach f generate FLATTEN(STRSPLIT(sub_domain, ‘\\.’));
    –What i am trying to do now is put values $1 to end in a ‘variable’
    D = FOREACH splt GENERATE $1 .. as name;

    –so then i can do a replace on the whole string
    B =foreach D generate REPLACE(name, ‘,’, ‘.’);

    If anybody can advise any thing please let me know

to create new topics or reply. | New User Registration

  • Author
  • #41169
    Jianyong Dai

    Hi, Dan,
    Does the script works for you? Are you asking a simpler way to do that?

    Dan Sadler

    Hi Jianyong,
    The script will run, but does not give the desired output
    I would like to be able to access name. Essentially flatten it here D = FOREACH splt GENERATE $1 .. name ; so values 1 – end are accessible, i don’t want the first value.
    I would then like to do a replace using B =foreach D generate REPLACE(name, ‘,’, ‘.’); from the previous code.
    What i am stuck with is removing the first column of a split and then concatinating ALL the values together. .


    Carlo Wayne

    Post has been outdated, but seems the output for this strings still invalid and never updated yet, as I am also trying it but similar issues I have just like Dan.. Any idea if they already releases the update for fixes? Thanks

    Carlo @ eatmywords.com

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.