Home Forums HDFS Loading a Fixed Width File

Tagged: 

This topic contains 2 replies, has 2 voices, and was last updated by  Carter Shanklin 5 months ago.

  • Creator
    Topic
  • #43477

    Seth Ramey
    Member

    I’m trying to figure out how to load the file below. I’ve heard that the best way is something to do with streaming but I don’t know to actually do this.

    Rows that begin with a “01” denote a new record. There can be many (variable number of) rows between “01” rows, but we are only interested in a handful (starts with “02” or “03” or “23” for example). For those rows, we may need to grab 10 characters beginning at position 7 for example. Another row may require 7 characters at position 140.

    My question is – how would we iterate through that via the query tools in Pig or Hive or some other method?

    I’ve uploaded a picture of a sample of the file below:

    http://www.flickr.com/photos/108407475@N04/

Viewing 2 replies - 1 through 2 (of 2 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #43631

    Carter Shanklin
    Participant

    Seth,

    Pig might work, I’m not an expert. Hive — no don’t use that for this use case.

    Streaming refers to Hadoop streaming. It will allow you to use your .NET application in conjunction with Map/Reduce. This requires Hadoop on Windows. See http://www.nuget.org/packages/Microsoft.Hadoop.MapReduce/ — that’s all I know about it unfortunately.

    Collapse
    #43487

    Seth Ramey
    Member

    If it helps, here are the rows we are interested in that I’m currently parsing via a small custom c# program. So to obtain the well district, I look on row (that starts with) 01, go to char 14 and get the next 2 characters. Assumption is that rows positions are zero based:

    public struct PositionLength
    {
    public int position, length;
    public string rowbeginswith;

    public PositionLength(string row, int pos, int len)
    {
    rowbeginswith = row;
    position = pos;
    length = len;
    }
    }

    static PositionLength pl_district = new PositionLength(“01″, 14, 2);
    static PositionLength pl_apinumber = new PositionLength(“01″, 2, 9);
    static PositionLength pl_total_depth = new PositionLength(“01″, 28, 5);
    static PositionLength pl_isplugged = new PositionLength(“01″, 90, 1);

    static PositionLength pl_oilgas = new PositionLength(“02″, 2, 1);

    static PositionLength pl_wellcompletiondate = new PositionLength(“03″, 39, 8);

    static PositionLength pl_lat = new PositionLength(“13″, 132, 9);
    static PositionLength pl_lng = new PositionLength(“13″, 142, 9);

    static PositionLength pl_oilnumber = new PositionLength(“23″, 50, 6);
    static PositionLength pl_gasnumber = new PositionLength(“23″, 56, 6);
    static PositionLength pl_operator = new PositionLength(“23″, 11, 6);

    Collapse
Viewing 2 replies - 1 through 2 (of 2 total)