The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

Pig Forum

HiveColumnarLoader not loading int columns

  • #52648
    Greg Smorag

    I have a problem with weird behaviour of HiveColumnadLoader and I wonder, if anybody can help me with that?

    The issue is, when I am attempting to load data from my hive table via pig, which looks more of less like the code below, pig never loads int columns. Everything is completed successfully, but the output of dumping Z, which has null values in place of ints. When I change ‘dump Z’ into something else, e.g. Mongo DB store, if just saves nulls in place of int values.

    register /opt/hadoop/pig/contrib/piggybank/java/piggybank.jar;
    register /opt/hadoop/hive/lib/hive-common-0.12.0.jar;
    register /opt/hadoop/hive/lib/hive-exec-0.12.0.jar;
    A = load '<VALID_HIVE_PATH>/<HIVE_TABLE>' USING'column_a int,column_b int,column_c string,column_d string,column_e int');
    L = foreach A generate *;
    Z = filter L by fk_client == '<PARTITION VALUE>';
    dump Z;

    HIVE_TABLE is a hive partitioned tableof type RCfile. The partition denominator column is fk_client column.

    I am using hadoop 2.3, hive 0.12.0 and pig 0.12.1.

    Can you please help with me and point out, what might be the reason or what I am possibily doing wrong, as I am running out of options?

    Thank you very much.

  • Author
  • #52742
    Thejas Nair

    I doubt if HiveColumnadLoader is widely used. You might want to try hcatalog’s HCatLoader with pig instead.

The forum ‘Pig’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.