Home Forums Hive / HCatalog analyse xml file in hadoop

This topic contains 3 replies, has 4 voices, and was last updated by  Josh Spiegel 1 week, 3 days ago.

  • Creator
    Topic
  • #28745

    Anupam Gupta
    Participant

    HI, i have uploaded XML file in hdfs , Now I want to know how can analyse/see the xml file in hadoop? I am new to hadoop please help.

    Thanlks,
    Agupta

Viewing 3 replies - 1 through 3 (of 3 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #64640

    Josh Spiegel
    Participant

    Oracle XML Extensions for Hive can be used to create a Hive table over XML.

    https://docs.oracle.com/cd/E54130_01/doc.26/e54142/oxh_hive.htm#BDCUG691

    Collapse
    #64499

    David Novogrodsky
    Participant

    I am having a similar problem.

    I created a Hive table using one column. Each row contains one XML record. Here is the script I used to create this first table:
    CREATE EXTERNAL TABLE xml_event_table (
    xmlevent string)
    STORED AS TEXTFILE
    LOCATION “/user/cloudera/vector/events”;

    Here is a sample XML Event. Part of an XML Event
    <Event xmlns=”http://schemas.microsoft.com/win/2004/08/events/event”><System><Provider Name=”Microsoft-Windows-Security-Auditing” Guid=”54849625-5478-4994-a5ba-3e3b0328c30d”></Provider> <EventID Qualifiers=””>4672</EventID> <Version>0</Version>…</Event>

    I want to create a view that contains the EventID. But the xPath is not working correctly:
    CREATE VIEW xpath_xml_event_view01(event_id, computer, user_id)
    AS SELECT
    xpath_string(xmlevent, ‘Event/System/EventID’),
    xpath_string(xmlevent, ‘/Event[1]/System[1]/Computer’),
    xpath_string(xmlevent, ‘/Event[1]/System[1]/EventID’)
    FROM xml_event_table;

    Collapse
    #28939

    Carter Shanklin
    Participant

    Agupta,

    Hive provides a number of XPath UDFs you can use.

    See https://cwiki.apache.org/confluence/display/Hive/LanguageManual+XPathUDF

    What is usually done is that the XML files are loaded into a Hive table using string columns, one per row. So you might have a DDL like CREATE TABLE xmlfiles (id int, xmlfile string);

    Then you can use any of the UDFs against the XML data.

    Collapse
Viewing 3 replies - 1 through 3 (of 3 total)