HOWTO: Using Apache Sqoop for Data Import from Relational DBs

ISSUE

How do I use Apache Sqoop for importing data from a relational DB?

SOLUTION

Apache Sqoop can be used to import data from any relational DB into HDFS, Hive or HBase.

To import data into HDFS, use the sqoop import command and specify the relational DB table and connection parameters:

sqoop import --connect <JDBC connection string> --table <tablename> --username <username> --password <password>

This will import the data and store it as a CSV file in a directory in HDFS.

To import data into Hive, use the sqoop import command and specify the option ‘hive-import’.

sqoop import --connect <JDBC connection string> --table <tablename> --username <username> --password <password> --hive-import

This will import the data into a Hive table with the approproate data types for each column.

Reference:

https://blogs.apache.org/sqoop/entry/apache_sqoop_overview

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.