The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

Hive / HCatalog Forum

Hive string cast exception with Avro

  • #18432

    Our engineering team is currently hitting an issue with using Avro in our Hive installation and are seeing an exception similar to the following when running a fairly simple query:

    java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.avro.util.Utf8

    The table definition in Hive is the following:

    hive> describe daily_12;
    uid string from deserializer
    userdatabydataprovider map<string,struct<dataproviderid:int,userid:string,info:map,events:map<string,array<struct<timestamp:bigint,attributes:map>>>,lastaccesstime:bigint>> from deserializer
    audiencedata array<struct> from deserializer

    And the query having the issue is similar to the following:

    select t.uid,t.userdatabydataprovider[‘1’].userid,t.userdatabydataprovider[‘1’].info,t.userdatabydataprovider[‘1’].events,audiencedata from daily_12 t limit 5;

    Supposedly, the same code/query works in a different HDP cluster (I need to confirm this) and I’m looking to try and determine if it is a cluster/Hive issue. We are currently running Hive in our cluster while the other cluster is running Hive… I’m looking for other significant differences.

    I’m not a true Java/Avro guy (know enough to be dangerous) but what I’ve found thus far is other people have had issues with strings and Avro expecting Utf8. I plan to sit with the engineers to look at the Avro schema(s) being used. I haven’t found anything specific to Hive but have seen some discussions related to Pig where people have hit a similar error using “” instead of “avro.util.Utf8” for the String property in schema fields.

    Any other tips/advice for troubleshooting this issue would be greatly appreciated.

  • Author
  • #18476

    Hi Bobby,
    Maybe you can try if you can even fetch just one row to help if it’s possible to isolate the problem to the data. If one row consistently works fine, then maybe a specific row or rows is causing the issue.


    Niels Basjes

    I’m running into similar problems with Pig.
    What I’ve found so far is that when writing an AVRO file using Java you can specify the class that is to be used for the string type as an argument for the avro compiler.

    As far as I can tell this option causes the actual avro file to be different and the file with “String” (instead of the default Utf8) is not fully supported by PIG. I would not be surprised if Hive has similar problems.

    Larry Liu

    Hi, Bobby,

    Can you please more background of using Avro? How did you install Avro and steps?

    I am reading the booking hadoop definite guide. There is a section talking about Avro. I hope it is helpful enough.


    Larry Liu

    Another quick question, Bobby, what version of HDP are you using?

The forum ‘Hive / HCatalog’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.