Home Forums Pig any tool to put cleansed data in hdfs?

Tagged: ,

This topic contains 3 replies, has 3 voices, and was last updated by  Seth Lyubich 11 months, 4 weeks ago.

  • Creator
    Topic
  • #23094

    dinesh karem
    Member

    likewise we use pig to extract data .. is there any way that we can cleanse the data before putting it into hdfs? and tool with in apache hadoop?

Viewing 3 replies - 1 through 3 (of 3 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #23317

    Seth Lyubich
    Keymaster

    Hi Dinesh,

    Here is some information on Talend that you can take a look at:

    http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.2.3.1/bk_dataintegration/content/ch_talend-intro.html

    Can you please take a look and let us know if this is helpful?

    Thanks,
    Seth

    Collapse
    #23242

    dinesh karem
    Member

    Not just to achieve structured data which we can do using Hcatalog, any other transforms that we can apply to eliminate duplicates or just to trim the data, so that we can achieve pure data in HDFS?

    Collapse
    #23095

    Robert
    Participant

    Hi Dinesh,
    As far as cleanse, what exactly do you mean? Do you simply mean putting in a structured format so you can use it within hive?

    Regards,
    Robert

    Collapse
Viewing 3 replies - 1 through 3 (of 3 total)