Hive table using org.apache.hadoop.hive.contrib.serde2.RegexSerDe

to create new topics or reply. | New User Registration

This topic contains 2 replies, has 3 voices, and was last updated by  Abhishek Basak 8 months ago.

  • Creator
    Topic
  • #46659

    Veerabahu
    Participant

    I have HDP2 installed,
    I have weblogs that I would like to analyze and so I was successful creating a hive table pointing to the weblog as mentioned in the below DDL,

    I can do select * from the table which brings me all the data,

    But I am not able to do count(*) or group by basically the mapreduce job is failing complaining that ClassNotFoundException: org.apache.hadoop.hive.contrib.serde2.RegexSerDe
    I used this class while creating the table, so is this class not accessible to mapreduce jobs?
    If so what do I need to do?

    Please advice

    CREATE TABLE table_name(
    col1 string,
    col2 string,
    col3 string,
    col4 string,
    col5 string,
    col6 string,
    col7 string,
    col8 string,
    col9 string,
    col10 string,
    col11 string)
    ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’
    WITH SERDEPROPERTIES (
    “input.regex” = “([^ ]*) ([^ ]*) (.) (.{28}) (.{3}[^\”]*.) ([0-9]*) ([0-9]*) (.[^\%]*.) (.[^\”]*.) (.[^\”]*.) ([0-9]*)”,
    “output.format.string” = “%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s %10$s %11$s”
    )
    LOCATION ‘/user/xyz/weblogs/';

    Thanks

Viewing 2 replies - 1 through 2 (of 2 total)

You must be to reply to this topic. | Create Account

  • Author
    Replies
  • #64302

    Abhishek Basak
    Participant

    Hi Yi,

    I also faced the same issue and the solution fixed the issue. ‘org.apache.hadoop.hive.serde2.RegexSerDe’ had to be used as SERDE instead of ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’. Thanks for posting the solution.

    Collapse
    #46753

    Yi Zhang
    Moderator

    Hi Sundari,

    For HDP 2.0, let us know how SERDE ‘org.apache.hadoop.hive.serde2.RegexSerDe’ works, instead of the ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’

    Thanks,
    Yi

    Collapse
Viewing 2 replies - 1 through 2 (of 2 total)
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.