Our engineering team is currently hitting an issue with using Avro in our Hive installation and are seeing an exception similar to the following when running a fairly simple query:
java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.avro.util.Utf8
The table definition in Hive is the following:
hive> describe daily_12;
uid string from deserializer
userdatabydataprovider map<string,struct<dataproviderid:int,userid:string,info:map,events:map<string,array<struct<timestamp:bigint,attributes:map>>>,lastaccesstime:bigint>> from deserializer
audiencedata array<struct> from deserializer
And the query having the issue is similar to the following:
select t.uid,t.userdatabydataprovider[‘1′].userid,t.userdatabydataprovider[‘1′].info,t.userdatabydataprovider[‘1′].events,audiencedata from daily_12 t limit 5;
Supposedly, the same code/query works in a different HDP cluster (I need to confirm this) and I’m looking to try and determine if it is a cluster/Hive issue. We are currently running Hive 0.10.0.21 in our cluster while the other cluster is running Hive 0.10.0.22… I’m looking for other significant differences.
I’m not a true Java/Avro guy (know enough to be dangerous) but what I’ve found thus far is other people have had issues with strings and Avro expecting Utf8. I plan to sit with the engineers to look at the Avro schema(s) being used. I haven’t found anything specific to Hive but have seen some discussions related to Pig where people have hit a similar error using “avro.java.string” instead of “avro.util.Utf8″ for the String property in schema fields.
Any other tips/advice for troubleshooting this issue would be greatly appreciated.