Import from Microsoft SQL Server into the Hortonworks Sandbox using Sqoop

This tutorial is from the Community part of tutorial for Hortonworks Sandbox – a single-node Hadoop cluster running in a virtual machine. Download to run this and other tutorials in the series.

This community tutorial submitted by flacrosse with source available at Github. Feel free to contribute edits or your own tutorial and help the community learn Hadoop.


For a simple proof of concept I wanted to get data from MS SQL Server into the Hortonworks Sandbox in an automated fasion using Sqoop. Apache Sqoop provides a way of efficiently transferring bulk data between Apache Hadoop and relational databases. This tutorial will show you how to use Sqoop to import data into the Hortonworks Sandbox from a Microsoft SQL Server data source.

Preparing the SQL Server Environment

On the SQL Server side there are a few key settings that need to be configured to ensure easy connectivity. For the purposes of this tutorial I’m assuming you’ll be trying to connect to a local install of SQL Server.

Create a login using SQL Server Authentication

When you create this user make sure to put on your DBA hat and grant all the permissions necessary to connect and access the data (Server Roles, User Mapping)
Alt text

Enable Mixed Mode Authentication

Make sure Mixed Mode Authentication is enabled at the server level so that the new user can connect.
Alt text

Enable TCP/IP Network Protocol

If you’re trying this against a local install of SQL Server chances are good that the default configuration is set to use the Shared Memory Network Protocol. You can check this setting by running the script below in SQL.

   CONNECTIONPROPERTY('net_transport') AS net_transport,
   CONNECTIONPROPERTY('protocol_type') AS protocol_type,
   CONNECTIONPROPERTY('auth_scheme') AS auth_scheme,
   CONNECTIONPROPERTY('local_net_address') AS local_net_address,
   CONNECTIONPROPERTY('local_tcp_port') AS local_tcp_port,
   CONNECTIONPROPERTY('client_net_address') AS client_net_address

If the code returns Shared Memory as the net_transport (as shown below), you’ll need to go into the SQL Server Configuration Manager, disable Shared Memory, make sure TCP/IP is enabled, and reboot the server.

Alt text

Once you have Shared Memory disabled the code should return an IP address. Make note of the IP address and the Local TCP Port number as we’ll be using them later in the Sqoop connection string.

Alt text

NOTE: It’s probably a good idea at this point to make sure the user is able to connect to the server and access some data. Log in through Management Studio with the credentials and do a SELECT against a table you want to import. If you can’t get to the data this way, Sqoop isn’t going to be able either.

Preparing the Hortonworks Sandbox

By default, the Microsoft SQL Server JDBC driver is not included in the Sqoop library that ships with the Hortonworks Sandbox so we need to get that driver before we’ll be able to connect the systems.

Login To the Virtual Machine

With the Sandbox running, jump into the shell by hitting Alt+F5
Alt text

The login and password are root/hadoop and will get you to the following screen
Alt text

Download the SQL Server JDBC Driver

Enter the following command to get to the /usr/local directory

cd /usr/local

Download and unzip the driver

curl -L '' | tar xz

You should now see a folder in the directory called sqljdbc_4.0 when you run the ls command.
Alt text

Copy the driver to the Sqoop library

While still in the/usr/local/ directory run the following command to copy the driver out to the Sqoop library folder

cp sqljdbc_4.0/enu/sqljdbc4.jar /usr/lib/sqoop/lib

Once this runs you should be able to see the driver in the list of available Sqoop drivers

cd /usr/lib/sqoop/lib

Alt text

Now that the driver has been added I recommend restarting the Sandbox completely. I’m sure there are ways to restart certain services so that the new driver becomes available but I had a much easier time just rebooting before continuing on.

Transfer data using Sqoop

Now we’re finally ready to use Sqoop to connect to SQL and pull data out of Microsoft’s sample AdventureWorks database.

Build the connection string

Before we start pulling data let’s verify our connection through Sqoop to SQL by listing the available databases. For the purposes of demonstration I’ll be hard coding the password into the connection string. The command for listing databases uses the following syntax

sqoop list-databases --connect jdbc:sqlserver://<IP Address>:<Port Number> --username <Username> --password <Password>

So for my connection I run the following code, using the IP Address and Port Number I saw earlier along with the user name and password I set up

sqoop list-databases --connect jdbc:sqlserver:// --username hadoop --password hadoop1

With that command, Sqoop should return a listing of all the available databases to connect to.

Alt text

Import a Table into Hive

With the AdventureWorks database I will import the Sales.SalesOrderDetail table. The command to import the table is structured like this:

sqoop import --connect "jdbc:sqlserver://<IP Address>;database=<Database Name>;username=<Username>;password=<Password>" --table <Table Name> --hive-import -- --schema <Schema Name>

So for my import I end up with the following connection string

sqoop import --connect "jdbc:sqlserver://;database=AdentureWorks2008R2;username=hadoop;password=hadoop1" --table SalesOrderDetail --hive-import -- --schema Sales

Alt text

Note: the schema argument is specific to the driver and must be passed at the end of the command after an empty — (dash dash) argument. If you are importing from the default schema of the database (like dbo) you do not need to specify the schema with this additional argument.

After you execute the command Hadoop is going to go to work and kick off a MapReduce job. You’ll see a lot of information flying past your eyes, but eventually you be left with the following results indicating your table has been successfully imported. (Use Shift+PageUp/PageDown if you want navigate through the command prompt information that flew past which is especially useful in debugging).

Alt text

Querying the Results

At this point we’re ready to query the Hive table through Beeswax. Go back into the Hue interface and click on the Hive tab and enter a simple select query against the table you imported. After the query runs you will see the results return.

SELECT * FROM SalesOrderDetail

Alt text

Alt text


Sqoop is a powerful tool and can be used with RDBMS systems to provide a number of benefits including incremental imports and exports as well as connectivity to HBase. You can find out more by checking out the Apache Sqoop project page.


Tri Nguyen
June 23, 2014 at 6:20 am

High quality tutorial, very appreciated.

BTW, in SQLServer Config Manager, there is no need to disable Shared Memory, you can just enable the TCP/IP protocol in addition to SHared Memory. Also must configure Windows Firewall to open port 1433.

July 15, 2014 at 10:44 am

Nice article thanks. I successfully imported a table.

Has anyone else had an issue with Hive after this import – error msg:
“Could not connect to”?

I’ve found some threads on this but haven’t found a soln yet.

Thank you folks.

August 4, 2014 at 8:14 am

Hi, I have two very basic queries. 1) I have imported the table data using an option –as-textfile ( with out -targetdir), the job got successful, Now I want to know where file got created ?? I checked in ‘File Browser’ I haven’t found any file. For your information I have given even the file , Is the target file gets the name of a table from which the data gets imported?? 2) The other question is , In which directory the logs for this sqoop statement gets stored? I checked in yarn but the log directory is empty there… Please advice…

August 28, 2014 at 11:24 am

Like Cary, I ran into his error and more! Any updates?

These links work fine:

Following links do not work:

BEESWAX (HIVE UI) and HCATALOG take 2-3 minutes before throwing error “Could not connect to”

FILE BROWSER and JOB BROWSER links take good 2-3 minutes before throwing error “”

OOZIE EDITOR/DASHBOARD: Waits for about a minute and errors with “The Oozie server is not running: “

Tyrone Williams
September 14, 2014 at 8:15 pm

when i Enable TCP/IP Network Protocol and disable Shared Memory , i cannot log into Sql Server Management Studio. I get an error SQL Network Interface Error 26- Error Locating Server/Instance Specified. SQL Server Stop and Starts just fine but cannot log back into Sql Server Management Studio

Giri R
October 1, 2014 at 10:47 am

Am getting this error when i sqoop the data from SQL Server.
I verified i have sqljdbc4.jar in /usr/lib/sqoop/lib. It’s working for a simple eval statement. But When i am trying hive-import, am getting this error.

Error: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.UnsupportedOperationException: Java Runtime Environment (JRE) version 1.7 is not supported by this driver. Use the sqljdbc4.jar class library, which provides support for JDBC 4.0.

Bryan Goodrich
November 14, 2014 at 4:11 pm

This worked great when I used Sandbox 2.1, but now I’m previewing HDP 2.2 and this tutorial is leading me into a number of frustrating errors that I’m not equipped to resolve! This includes finding the right locations to move files (now /usr/hdp/ and odd compilation errors with the java that prevent it from actually importing the data. I’m looking forward with working with the latest version of HDP, but now I can’t even redo the first steps of my last adventure into 2.1

    Krishna Reddy Munnangi
    May 1, 2015 at 1:15 pm

    Let me know if you are still facing the issue. I can help you if possible.

December 17, 2014 at 7:13 pm

Hi,when i sqoop the data from SQL Server getting this error .

sqoop import –connect “jdbc:sqlserver://;database=AdventureWorks2008;username=xxxx;password=xxxx” –table SalesOrderDetail –hive-import — –schema Sales

ERROR tool.ImportAllTablesTool: Encountered IOException running import job: Failed to run job : Application application_1418353490929_0039 submitted by user root to unknown queue: default

    September 25, 2015 at 2:06 pm

    You may have done the earlier ambari tutorial in setting up priority queues. Revert to the v1 of these yarn settings and it should work.

Krishna Reddy Munnangi
May 1, 2015 at 10:57 am

To get the SQL Server IP address you can use the following query by connecting the SQL Server using SQL Server User Account. No need to disable Shared Memory protocol.

select c.net_transport, c.net_transport, c.local_net_address,c.local_tcp_port
FROM sys.dm_exec_connections AS c
where c.session_id=@@spid

Krishna Reddy Munnangi
May 1, 2015 at 12:00 pm

Use the following command on HDP 2.2 version to copy the JDBC Driver

cp /usr/local/sqljdbc_4.0/enu/*.jar /usr/hdp/

Bharathi Santhanath
May 12, 2015 at 12:53 pm

A nice article and documentation for SQL server data imports

August 18, 2015 at 5:13 am

Thanks for the tutorial. I would like to pull data from SAP tables (NON-HANA environment) in HDFS using SQOOP. Is it possible ? Not sure if the non-HANA SAP allows SQOOP to reach directly to its database tables. Its MSSQL database.

John Schroeder
September 9, 2015 at 4:25 pm

I get the error message: curl: (6) Couldn’t resolve host ‘’

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre lang="" line="" escaped="" cssfile="">