How To Install Hadoop on Windows with HDP 2.0
Installing the Hortonworks Data Platform 2.0 for Windows is straightforward. Lets take a look at how to install a one node cluster on your Windows Server 2012 R2 machine.
To start, download the HDP 2.0 for Windows package. The package is under 1 GB, and will take a few moments to download depending on your internet speed. Documentation for installing a single node instance is located here. This blog post will guide you through that instruction set to get you going with HDP 2.0 for Windows!
Here’s an outline of the process you’ll work through to deploy:
- Install the prerequisites
- Deploy HDP on your single node machine
- Start the services
- Run smoke tests to validate the install
Install the Pre-requisites
You’ll now install Java, Python, and MSFT C++ run time. Windows Server 2012 already has the up to date .NET runtime, so you can skip that step.
Let’s download the C++ run time, and install that by double clicking the downloaded MSI.
Download Python 2.7.x, and double click the downloaded MSI to install the package.
Once you’ve installed, you’ll need to ensure HDP can find Python – by updating the PATH System Environment variable.
Go to Computer > Properties > Advanced System Settings > Environment variables. Then append the install path to Python, for example C:\Python27, to this path after a ‘;’:
Verify your path is setup by entering a new Powershell or Command Prompt and typing:
python, which should run the python interpreter. Type
quit() to exit.
Setup Java, which you can get here. You will also need to setup JAVA_HOME, which Hadoop requires. Make sure to install Java to somewhere without a space in the path – “Program Files” will not work!
To setup JAVA_HOME, in Explorer > right click Computer > Properties > Advanced System Settings > Environment variables. Then setup a new System variable called JAVA_HOME that points to your Java install (in this case,
Install the MSI package
Now we have all the pre-requisites installed. The next step is to install the HDP 2.0 for Windows package.
Extract the MSI from the zip package you downloaded earlier. Open a Powershell prompt in Administrator (“Run as Administrator”) mode, and execute the MSI through this command:
msiexec /i "hdp-22.214.171.124.winpkg.msi"
The HDP Setup window appears pre-populated with the host name of the server, as well as default installation parameters. Now, complete the form with your parameters:
- Set the Hadoop User Password. This enables you to log in as the administrative user and perform administrative actions. This must match your local Windows Server password requirements. We recommend a strong pasword. Note the password you set – we’ll use this later.
- Check ‘Delete Existing HDP Data’. This ensures that HDFS will be formatted and ready to use after you install.
- Check ‘Install HDP Additional Components’. Select this check box to install Zookeeper, Flume, and HBase as HDP services deployed to the single node server.
- Set the Hive and Oozie database credentials. Set ‘hive’ for all Hive Metastore entries, and ‘oozie’ for all Oozie Metastore entries.
- Select DERBY, and not MSSQL, as the DB Flavor in the dropdown selection. This will setup HDP to use an embedded Derby database, which is ideal for the evaluation single node scenario.
When you have finished setting the installation parameters, click ‘Install’ to install HDP.
The HDP Setup window will close, and a progress indicator will be displayed while the installer is running. The installation will take a few minutes – disregard the progress bar expected time display.
The MSI installer window will display an info prompt when the installation is finished and successful.
Start the services and run a jobs
Once the install is successful, you will start the HDP services on the single node.
Open a command prompt, and navigate to the HDP install directory. By default, the location is “C:\hdp”, unless you set a different location:
Validate the install by running the full suite of smoke tests. It’s easiest to run the smoke tests as the HDP super user: ‘hadoop’.
In a command prompt, switch to using the ‘hadoop’ user:
runas /user:hadoop cmd
When prompted, enter the password you had set up during install.
Run the provided smoke tests as the hadoop user to verify that the HDP 2.0 services work as expected:
This will fire up a Mapreduce job on your freshly set up cluster. If it fails the first time, try running it again with the same command
Congratulations, you are now Hadooping on Windows!