Pig Forum

DBStorage Incompatibility with other storage in script

  • #26333
    Hardik Shah

    DBStorage is not working with other storage in pig script. means DBStorage is not working with multiple storage statement.

    What I was trying for: 1) I was trying to Store one output using DBStorage And was trying to store same or different output using simple Store to file system 2) I also tried to store using DBStorage and using my custom store function

    But in both cases it not storing the data to database. If I comment out another storage than DBStorage is working properly.

    Even its not throwing any exception or error on reducer’s machine..

    Can anyone point out the problem?

    DBStorage is not working with Simple Store to file system. Its only working if I put only DBStorage no other Store Statement..

    pv_by_industry = GROUP profile_view by viewee_industry_id

    pv_avg_by_industry = FOREACH pv_by_industry GENERATE
    group as viewee_industry_id, AVG(profie_view) AS average_pv;

    STORE pv_avg_by_industry INTO ‘/tmp/hardik’;

    STORE pv_avg_by_industry into /tmp/hardik/db’ INTO
    ‘dbc:mysql://hostname/dbname’, ‘user’,
    ‘INSERT INTO table (viewee_industry_id,average_pv) VALUES(?,?)’);

to create new topics or reply. | New User Registration

  • Author
  • #26334
    Hardik Shah

    Hi again,

    Few things are came into picture when I was debugging it.

    DBStorage is setting Auto commit to False.
    So when the batch is executed it will not be auto committed.

    After executing batch OutputCommiter’s method commitTask in DBStorage (inline class’ method) was called in which commit is written

    if (ps != null) {
    try {System.out.println(“Executing Batch in commitTask”);
    ps = null;
    con = null;
    } catch (SQLException e) {System.out.println(“Exception in commitTask”);
    log.error(“ps.close”, e);
    throw new IOException(“JDBC Error”, e);

    and this method is called by PigOutputCommiter

    public void commitTask(TaskAttemptContext context) throws IOException {
    if(HadoopShims.isMap(context.getTaskAttemptID())) {
    for (Pair mapCommitter :
    mapOutputCommitters) {
    if (mapCommitter.first!=null) {
    TaskAttemptContext updatedContext = setUpContext(context,
    } else {
    for (Pair reduceCommitter :
    reduceOutputCommitters) {
    if (reduceCommitter.first!=null) {
    TaskAttemptContext updatedContext = setUpContext(context,

    But when this commitTask is called its connection and preparedStatment Objects become null… so it ll not commit so data is not available in Database…..

    But if you write only DBStorage without any other Store statement in script it will work properly..

    Any clues???

    Larry Liu

    Hi, Hardik

    Is this command working for you?

    STORE pv_avg_by_industry INTO
    ‘dbc:mysql://hostname/dbname’, ‘user’,
    ‘INSERT INTO table (viewee_industry_id,average_pv) VALUES(?,?)’);

    I am not sure if STORE support mutiple INTO. So I remove ‘ into /tmp/hardik/db’’ from your command.


    Hardik Shah

    sorry it should be like this:
    STORE pv_avg_by_industry into /tmp/hardik/db’ using
    ‘dbc:mysql://hostname/dbname’, ‘user’,
    ‘INSERT INTO table (viewee_industry_id,average_pv) VALUES(?,?)’);

    Hardik Shah

    actually my production code is too large so I put just a sample code to explain the situation …



    Any Pig dev’s out there watching that can help this guy out?


You must be to reply to this topic. | Create Account

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.