Loading data to MySQL table from Hive using Sqoop/Oozie

Tagged: ,

This topic contains 1 reply, has 2 voices, and was last updated by  Amudhan K 8 months, 2 weeks ago.

  • Creator
    Topic
  • #53663

    Rio Mario
    Participant

    I am facing some grave issue (tried unsuccessful 196 times) while loading data using Sqoop (sqoop command inside Oozie) to MySQL table. If there is only one column of data in HDFS (here foo) then there is no issue but when there are more than 1 column e.g. 2 columns, data does not get loaded to MySQL.

    If I run the Sqoop by itself, then the data gets loaded to MySQL but when I put inside Oozie, data does not get laoded.

    The workflow.xml has 2 parts, 1st it loads data from Hive table to HDFS and 2nd loads data from HDFS to MySQL.

    I am using ClouderaVM.

    Error message is:

    Caused by: java.lang.NumberFormatException: For input string: “1 a”
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
    at java.lang.Integer.parseInt(Integer.java:458)

    hive> CREATE EXTERNAL TABLE IF NOT EXISTS foo (
    name string,
    city string
    )
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ‘\t’
    LINES TERMINATED BY ‘\n’
    STORED AS TEXTFILE
    LOCATION ‘/user/cloudera/foo';

    $ vi foo
    1 a
    4 b

    hive> load data local inpath ‘/home/cloudera/foo’ into table foo;

    mysql> CREATE TABLE foo (id int(11) DEFAULT NULL, name varchar(22) DEFAULT NULL );

    workflow.xml:

    <?xml version=”1.0″ encoding=”UTF-8″?>
    <workflow-app xmlns=”uri:oozie:workflow:0.2″ name=”etl-wf”>
    <start to=”hive-node”/>

    <action name=”hive-node”>
    <hive xmlns=”uri:oozie:hive-action:0.2″>
    <job-tracker>${jobTracker}</job-tracker>
    <name-node>${nameNode}</name-node>
    <job-xml>hive-site.xml</job-xml>
    <script>script.q</script>
    </hive>
    <ok to=”sqoop-node”/>
    <error to=”fail”/>
    </action>

    <action name=”sqoop-node”>
    <sqoop xmlns=”uri:oozie:sqoop-action:0.2″>
    <job-tracker>${jobTracker}</job-tracker>
    <name-node>${nameNode}</name-node>
    <command>export –connect jdbc:mysql://www.abc.net/test –username rio –password r005 –table foo –export-dir /user/cloudera/test –input-fields-terminated-by ‘\t’ –input-lines-terminated-by ‘\n'</command>
    </sqoop>
    <ok to=”end”/>
    <error to=”fail”/>
    </action>

    <kill name=”fail”>
    <message>Hive failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name=”end”/>

    </workflow-app>

    Note: here 1st part works i.e. data gets laoded to test table in hive but does not get loaded from hdfs: /user/cloudera/test to MySQL table foo.

    vi script.q:

    CREATE EXTERNAL TABLE IF NOT EXISTS test (
    id int,
    city string
    )
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ‘\t’
    LINES TERMINATED BY ‘\n’
    STORED AS TEXTFILE
    LOCATION
    ‘/user/cloudera/test';

    INSERT OVERWRITE table test SELECT * FROM foo;

    stderr logs

    Note: /tmp/sqoop-mapred/compile/d4f769ef09667984820f21a38ae27bb4/foo.java uses or overrides a deprecated API.
    Note: Recompile with -Xlint:deprecation for details.
    java.io.IOException: Can’t export data, please check task tracker logs
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
    at org.apa

Viewing 1 replies (of 1 total)

You must be to reply to this topic. | Create Account

  • Author
    Replies
  • #54247

    Amudhan K
    Participant

    Hello Rio Mario,
    .The problem which you are facing is not a oozie problem. Its from the hive and sqoop. Did you give select * from test query in hive and checked it out? Because when we use terminated by ‘/t’ sometimes the data gets loads in HDFS but wont be able to read the record, thats why the mysql is unable to store the record from HDFS. What I would suggest is change the query ehich you are executing in both the Hive as well as sqoop from terminated by ‘/t’ to space character like terminated by ‘ ‘. then It worked correctly.

    Collapse
Viewing 1 replies (of 1 total)
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.