The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

Hortonworks Sandbox Forum

Import Microsoft SQL data into Sandbox

  • #28536


    Last week I tried to import some Microsoft SQL Server data into the Sandbox. As a Microsoft based developer I didn’t have experience with Linux nor Hadoop and it was difficult to get it working. I got it to work with the steps below, hopefully other MS developers can use it to get enthusiastic about Hadoop :-)

    We can use Sqoop to import data, this is already installed on the Sandbox. Sqoop commands can be executed via Bash in the web browser Hue shell or directly on the VM (logon root/hadoop).

    The Sqoop command to import data into the HDFS is: (– = 2 dashes!)
    sqoop import –connect –table –username –password

    A JDBC connection string is required to connect to the MS SQL server. This means a JDBC driver for SQL Server needs to be installed on the Hadoop environment. The Sandbox runs on CentOS, which is a Linux distribution (basic commands). These are the steps I followed:

    1. Find the JDBC drivers for MS SQL on Microsoft, currently sqljdbc_4.0.2206.100_enu.tar.gz.
    2. To download it to the VM, you need specific rights, so we need to switch accounts and create a new session with the root user. Type su – root and enter the password hadoop.
    3. Move to the sqoop directory: cd /usr/lib/sqoop
    4. Download the driver with this command: wget
    5. Unzip: gunzip sqljdbc_4.0.2206.100_enu.tar.gz
    6. Extract tar: tar –xvf sqljdbc_4.0.2206.100_enu.tar. This doesn’t seem to work in Bash, so you need to login to the VM itself to process this step…
    7. Copy the jar files to the lib: cp /sqljdbc_4.0/enu/*.jar /usr/lib/sqoop/lib

    Now the JDBC SQL Server driver is ready to be used. Because Sqoop tries to connect to your SQL Server from ‘the outside’, you need to adjust your local firewall settings. Make sure port 1433 is open for ‘public’.

    The sqoop command to import can look like this:

    sqoop import –connect “jdbc:sqlserver://;databasename=mssqldatabase;” –table mytable –username sqluseraccount –password mypwd –hive-import

    The IP address is the IP of my local machine. I added the optional – -hive-import flag which means that after the import into HDFS, it will import the data into a Hive table (and yes, sending password like this is insecure).

    The import can take a while, good luck!

  • Author
  • #28617

    Hi Jeroen,

    Thanks for the information. I would like to point out, though, that since you are logged into the Sandbox shell as root, you probably do not need to ‘su – root’.



    Hi Ted, you’re right, you only need to change the session when working in Bash.

    Carter Shanklin

    Jeroen, thanks for the writeup. Eventually the Sandbox will have a Sqoop UI but this is going to help a lot of people out in the meanwhile. As a refinement I’ll mention that tar -zxvf will extract the tar.gz in place without the need to uncompress it.


    Hi Ramirez,

    No definite timeline at this time, but keep your eyes on the Announcements forum for updates, not that doesn’t necessarily mean that it is going to be soon, it’s just that that is where we’ll let you know when it has happened.


    Andrew Peterson

    Hi Jeroen,

    thanks for the write up!


    Antonio Paternina

    Hi ¡ Jeroen, good your solution, but i am this problema…I usign HortonWork SandBox.

    java.lang.UnsupportedOperationException: Java Runtime Environment (JRE) version 1.6 is not supported by this driver. Use the sqljdbc4.jar class library, which provides support for JDBC 4.0.

    I tried some solutions, but they have not given me results. chague the driver to versión 3.0, but i get the same error.
    Thank ¡¡

The forum ‘Hortonworks Sandbox’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.