Home Forums Hive / HCatalog Hive + HCatalog + Sequence File

This topic contains 0 replies, has 1 voice, and was last updated by  Tanzir 4 months, 1 week ago.

  • Creator
    Topic
  • #53526

    Tanzir
    Participant

    Hello everyone,
    I have been using Hive for a while but all of that time I used text file as the file format. Now I want to convert the file format to sequence file (or may be Avro later). I also want to leverage HCatalog here.

    So let me explain:

    -> Sqoop imports a table to HDFS
    -> Then MR process those files which creates multiple tables(Hive data files with partition) for Hive
    -> Then I load those files to Hive

    So far its working in text file format.

    Suppose, I have this Hive schema for one of the tables:

    create external table if not exists table1 (
    ID bigint,
    batch_number bigint,
    name string
    )
    PARTITIONED BY (year INT, month INT, day INT)
    CLUSTERED BY (batch_number) into 20 buckets
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’
    LOCATION ‘/user/tempuser/hive/table1′;

    —–
    And after MR processed the data, I load the data into Hive table by something like:
    ALTER TABLE TABLE1 ADD IF NOT EXISTS PARTITION(year = 2013, month = 2, day = 8) LOCATION ‘/user/tempuser/hive/table1/2013/02/08/’

    Now I want to change file format from text to sequence file and I already changed the MR part which now outputting data in sequence file format. So in order to load them to Hive table again what change should I need to make?

    I have created a table with hcat with the following schema:
    —–
    create external table if not exists table1 (
    ID bigint,
    batch_number bigint,
    name string
    )
    PARTITIONED BY (year string, month string, day string)
    CLUSTERED BY (batch_number) into 20 buckets
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’
    STORED AS SEQUENCEFILE
    LOCATION ‘/user/tempuser/hive/table1′;
    —-
    After I created that I loaded them which didn’t throw any error. But when I tried to query it, it started throwing exception:
    “Caused by: java.lang.RuntimeException: java.io.IOException: WritableName can’t load class: com.abc.def.ghi.TextArrayWritable

    So definitely I’m missing something here. Do I still need to implement Hive SerDe even if I use HCatalog?

    I have checked Hortonwork’s examples/tutorials but unfortunately that didn’t help me for what I’m trying to achieve here and so far I didn’t find a well written documentation for integration of Hive with HCatalog, Sqoop, Sequence/Avro file format.

    Any information or link for a good documents/tutorials will be highly appreciated.

    Thanks.

You must be logged in to reply to this topic.