Home Forums Hortonworks Sandbox Import Microsoft SQL data into Sandbox

This topic contains 7 replies, has 6 voices, and was last updated by  Antonio Paternina 9 months ago.

  • Creator
    Topic
  • #28536

    Hi,

    Last week I tried to import some Microsoft SQL Server data into the Sandbox. As a Microsoft based developer I didn’t have experience with Linux nor Hadoop and it was difficult to get it working. I got it to work with the steps below, hopefully other MS developers can use it to get enthusiastic about Hadoop :-)

    We can use Sqoop to import data, this is already installed on the Sandbox. Sqoop commands can be executed via Bash in the web browser Hue shell or directly on the VM (logon root/hadoop).

    The Sqoop command to import data into the HDFS is: (– = 2 dashes!)
    sqoop import –connect –table –username –password

    A JDBC connection string is required to connect to the MS SQL server. This means a JDBC driver for SQL Server needs to be installed on the Hadoop environment. The Sandbox runs on CentOS, which is a Linux distribution (basic commands). These are the steps I followed:

    1. Find the JDBC drivers for MS SQL on Microsoft, currently sqljdbc_4.0.2206.100_enu.tar.gz.
    2. To download it to the VM, you need specific rights, so we need to switch accounts and create a new session with the root user. Type su – root and enter the password hadoop.
    3. Move to the sqoop directory: cd /usr/lib/sqoop
    4. Download the driver with this command: wget http://download.microsoft.com/download/0/2/A/02AAE597-3865-456C-AE7F-613F99F850A8/sqljdbc_4.0.2206.100_enu.tar.gz
    5. Unzip: gunzip sqljdbc_4.0.2206.100_enu.tar.gz
    6. Extract tar: tar –xvf sqljdbc_4.0.2206.100_enu.tar. This doesn’t seem to work in Bash, so you need to login to the VM itself to process this step…
    7. Copy the jar files to the lib: cp /sqljdbc_4.0/enu/*.jar /usr/lib/sqoop/lib

    Now the JDBC SQL Server driver is ready to be used. Because Sqoop tries to connect to your SQL Server from ‘the outside’, you need to adjust your local firewall settings. Make sure port 1433 is open for ‘public’.

    The sqoop command to import can look like this:

    sqoop import –connect “jdbc:sqlserver://192.168.25.1:1433;databasename=mssqldatabase;” –table mytable –username sqluseraccount –password mypwd –hive-import

    The IP address is the IP of my local machine. I added the optional - -hive-import flag which means that after the import into HDFS, it will import the data into a Hive table (and yes, sending password like this is insecure).

    The import can take a while, good luck!

Viewing 7 replies - 1 through 7 (of 7 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #47607

    Antonio Paternina
    Participant

    Hi ¡ Jeroen, good your solution, but i am this problema…I usign HortonWork SandBox.

    java.lang.UnsupportedOperationException: Java Runtime Environment (JRE) version 1.6 is not supported by this driver. Use the sqljdbc4.jar class library, which provides support for JDBC 4.0.

    I tried some solutions, but they have not given me results. chague the driver to versión 3.0, but i get the same error.
    Thank ¡¡

    Collapse
    #46959

    Andrew Peterson
    Participant

    Hi Jeroen,

    thanks for the write up!

    Andrew

    Collapse
    #29239

    tedr
    Moderator

    Hi Ramirez,

    No definite timeline at this time, but keep your eyes on the Announcements forum for updates, not that doesn’t necessarily mean that it is going to be soon, it’s just that that is where we’ll let you know when it has happened.

    Thanks,
    Ted.

    Collapse
    #29182
    Collapse
    #28937

    Carter Shanklin
    Participant

    Jeroen, thanks for the writeup. Eventually the Sandbox will have a Sqoop UI but this is going to help a lot of people out in the meanwhile. As a refinement I’ll mention that tar -zxvf will extract the tar.gz in place without the need to uncompress it.

    Collapse
    #28813

    Hi Ted, you’re right, you only need to change the session when working in Bash.

    Collapse
    #28617

    tedr
    Moderator

    Hi Jeroen,

    Thanks for the information. I would like to point out, though, that since you are logged into the Sandbox shell as root, you probably do not need to ‘su – root’.

    thanks,
    Ted.

    Collapse
Viewing 7 replies - 1 through 7 (of 7 total)