Hive / HCatalog Forum

Hive table using org.apache.hadoop.hive.contrib.serde2.RegexSerDe

  • #46659

    I have HDP2 installed,
    I have weblogs that I would like to analyze and so I was successful creating a hive table pointing to the weblog as mentioned in the below DDL,

    I can do select * from the table which brings me all the data,

    But I am not able to do count(*) or group by basically the mapreduce job is failing complaining that ClassNotFoundException: org.apache.hadoop.hive.contrib.serde2.RegexSerDe
    I used this class while creating the table, so is this class not accessible to mapreduce jobs?
    If so what do I need to do?

    Please advice

    CREATE TABLE table_name(
    col1 string,
    col2 string,
    col3 string,
    col4 string,
    col5 string,
    col6 string,
    col7 string,
    col8 string,
    col9 string,
    col10 string,
    col11 string)
    ROW FORMAT SERDE ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’
    “input.regex” = “([^ ]*) ([^ ]*) (.) (.{28}) (.{3}[^\”]*.) ([0-9]*) ([0-9]*) (.[^\%]*.) (.[^\”]*.) (.[^\”]*.) ([0-9]*)”,
    “output.format.string” = “%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s %10$s %11$s”
    LOCATION ‘/user/xyz/weblogs/’;


to create new topics or reply. | New User Registration

  • Author
  • #46753
    Yi Zhang

    Hi Sundari,

    For HDP 2.0, let us know how SERDE ‘org.apache.hadoop.hive.serde2.RegexSerDe’ works, instead of the ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’


    Abhishek Basak

    Hi Yi,

    I also faced the same issue and the solution fixed the issue. ‘org.apache.hadoop.hive.serde2.RegexSerDe’ had to be used as SERDE instead of ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’. Thanks for posting the solution.

You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.