Hive parquet problem

Tagged: 

This topic contains 0 replies, has 1 voice, and was last updated by  Kamil Malachowski 9 months, 1 week ago.

  • Creator
    Topic
  • #54264

    Kamil Malachowski
    Participant

    Hi guys,
    I have problem with reading hive tables stored in parquet format, it gives following errorŁ

    Caused by: java.io.IOException: can not read class parquet.format.PageHeader: null
    at parquet.format.Util.read(Util.java:50)
    at parquet.format.Util.readPageHeader(Util.java:26)
    at parquet.hadoop.ParquetFileReader$Chunk.readAllPages(ParquetFileReader.java:418)
    at parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:361)
    at parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:100)
    at parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:172)
    at parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130)
    at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:95)
    at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:66)
    at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:51)

    Tables were created with parquet 1.2.5 and copied with distcp to Hortonworks 2.1 clutester with hive 0.13, and I guess parquet 1.3.5.
    I found that my issue may be ralated to https://github.com/Parquet/parquet-mr/pull/349

    Is there any quick workaround, e.g. some settings, that will will resolve my problem?

    Best Regards
    Kamil

You must be to reply to this topic. | Create Account

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.