The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

Hive / HCatalog Forum

Hive + HCatalog + Sequence File

  • #53526
    Tanzir
    Participant

    Hello everyone,
    I have been using Hive for a while but all of that time I used text file as the file format. Now I want to convert the file format to sequence file (or may be Avro later). I also want to leverage HCatalog here.

    So let me explain:

    -> Sqoop imports a table to HDFS
    -> Then MR process those files which creates multiple tables(Hive data files with partition) for Hive
    -> Then I load those files to Hive

    So far its working in text file format.

    Suppose, I have this Hive schema for one of the tables:

    create external table if not exists table1 (
    ID bigint,
    batch_number bigint,
    name string
    )
    PARTITIONED BY (year INT, month INT, day INT)
    CLUSTERED BY (batch_number) into 20 buckets
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’
    LOCATION ‘/user/tempuser/hive/table1’;

    —–
    And after MR processed the data, I load the data into Hive table by something like:
    ALTER TABLE TABLE1 ADD IF NOT EXISTS PARTITION(year = 2013, month = 2, day = 8) LOCATION ‘/user/tempuser/hive/table1/2013/02/08/’

    Now I want to change file format from text to sequence file and I already changed the MR part which now outputting data in sequence file format. So in order to load them to Hive table again what change should I need to make?

    I have created a table with hcat with the following schema:
    —–
    create external table if not exists table1 (
    ID bigint,
    batch_number bigint,
    name string
    )
    PARTITIONED BY (year string, month string, day string)
    CLUSTERED BY (batch_number) into 20 buckets
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’
    STORED AS SEQUENCEFILE
    LOCATION ‘/user/tempuser/hive/table1’;
    —-
    After I created that I loaded them which didn’t throw any error. But when I tried to query it, it started throwing exception:
    “Caused by: java.lang.RuntimeException: java.io.IOException: WritableName can’t load class: com.abc.def.ghi.TextArrayWritable

    So definitely I’m missing something here. Do I still need to implement Hive SerDe even if I use HCatalog?

    I have checked Hortonwork’s examples/tutorials but unfortunately that didn’t help me for what I’m trying to achieve here and so far I didn’t find a well written documentation for integration of Hive with HCatalog, Sqoop, Sequence/Avro file format.

    Any information or link for a good documents/tutorials will be highly appreciated.

    Thanks.

The forum ‘Hive / HCatalog’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.