Home Forums Hive / HCatalog How to import XML

Tagged: 

This topic contains 3 replies, has 3 voices, and was last updated by  Robert 12 months ago.

  • Creator
    Topic
  • #32908


    Member

    I would like to import an XML document into HCatalog, Based on the format I would expect multiple tables to be created. Is there a good way to import directly into HCatalog or should I break the file into something like multiple CSV’s and then import them to HCatalog?

Viewing 3 replies - 1 through 3 (of 3 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #33596

    Robert
    Participant

    Hi Member,
    Yes, you can use PIG to pretty much massage the data and once its in a structure hive can use, you can then query the data via hive sql.

    Regards,
    Robert

    Collapse
    #33003


    Member

    Thank you for your recommendations! I’m a complete newbie so forgive my ignorance. Are you saying that I can use Pig to load and store the data into HCatalog and then once it’s in there I can use Hive to query it?

    Collapse
    #32976

    abdelrahman
    Moderator

    Hi,

    The HCatlog interfaces can support PIG and MR (Hive) for unstructured and structured data. Since its XML I recommend using the HCatlog and PIG interfaces called “load and store”. Here is more info about the interfaces:

    http://hive.apache.org/docs/hcat_r0.5.0/index.html#Interfaces

    You may also structure the XML as a table and use the Hive interfaces (Hcatinputformat, Hcatoutoutformat). Hope this helps.

    Thanks

    -Abdelrahman

    Collapse
Viewing 3 replies - 1 through 3 (of 3 total)