Hive string cast exception with Avro

to create new topics or reply. | New User Registration

This topic contains 4 replies, has 4 voices, and was last updated by  Larry Liu 2 years, 3 months ago.

  • Creator
  • #18432

    Our engineering team is currently hitting an issue with using Avro in our Hive installation and are seeing an exception similar to the following when running a fairly simple query:

    java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.avro.util.Utf8

    The table definition in Hive is the following:

    hive> describe daily_12;
    uid string from deserializer
    userdatabydataprovider map<string,struct<dataproviderid:int,userid:string,info:map,events:map<string,array<struct<timestamp:bigint,attributes:map>>>,lastaccesstime:bigint>> from deserializer
    audiencedata array<struct> from deserializer

    And the query having the issue is similar to the following:

    select t.uid,t.userdatabydataprovider[‘1′].userid,t.userdatabydataprovider[‘1′].info,t.userdatabydataprovider[‘1′].events,audiencedata from daily_12 t limit 5;

    Supposedly, the same code/query works in a different HDP cluster (I need to confirm this) and I’m looking to try and determine if it is a cluster/Hive issue. We are currently running Hive in our cluster while the other cluster is running Hive… I’m looking for other significant differences.

    I’m not a true Java/Avro guy (know enough to be dangerous) but what I’ve found thus far is other people have had issues with strings and Avro expecting Utf8. I plan to sit with the engineers to look at the Avro schema(s) being used. I haven’t found anything specific to Hive but have seen some discussions related to Pig where people have hit a similar error using “” instead of “avro.util.Utf8″ for the String property in schema fields.

    Any other tips/advice for troubleshooting this issue would be greatly appreciated.

Viewing 4 replies - 1 through 4 (of 4 total)

You must be to reply to this topic. | Create Account

  • Author
  • #23172

    Larry Liu

    Another quick question, Bobby, what version of HDP are you using?


    Larry Liu

    Hi, Bobby,

    Can you please more background of using Avro? How did you install Avro and steps?

    I am reading the booking hadoop definite guide. There is a section talking about Avro. I hope it is helpful enough.



    Niels Basjes

    I’m running into similar problems with Pig.
    What I’ve found so far is that when writing an AVRO file using Java you can specify the class that is to be used for the string type as an argument for the avro compiler.

    As far as I can tell this option causes the actual avro file to be different and the file with “String” (instead of the default Utf8) is not fully supported by PIG. I would not be surprised if Hive has similar problems.



    Hi Bobby,
    Maybe you can try if you can even fetch just one row to help if it’s possible to isolate the problem to the data. If one row consistently works fine, then maybe a specific row or rows is causing the issue.


Viewing 4 replies - 1 through 4 (of 4 total)
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.