Automated Install of HDP 2.1 for Hadoop on Windows

HDP works across Linux or Windows, On-Prem or Cloud

Hortonworks Data Platform 2.1 for Windows is the 100% open source data management platform based on Apache Hadoop and available for the Microsoft Windows Server platform. I have built a helper tool that automates the process of deploying a multi-node Hadoop cluster – utilizing the MSI available in HDP 2.1 for Windows.

Download HDP 2.1 for Windows

HDP on Windows MSI Overview

HDP on Windows installation package comes in the format of MSI, Microsoft’s MSI format utilizes the installation and configuration service provided with Windows called Windows Installer. The installer service enables customers to provide better corporate deployment and provides a standard format for component management. This blog will give you an idea of how Hortonworks Data Platform on Windows and the MSI package it offers can be automated and customized for deployment within you company.

The documentation and pre-requirements for prepping the environment for installation can be found here. The idea behind automating installation with the MSI in your own environment is to follow the documentation steps and automate any requirements based on what Hadoop needs as well as what your company standards are so that you may provision HDP clusters and data nodes in a faster and more controlled manner.

Automated Installer Overview

Knowing I have the MSI and with a little bit of PowerShell remoting and .NET it was easy to automate the steps, application requirements installation and settings configurations to automate installing multi node Hadoop clusters with HDP on Windows.  The steps this installer performs are bellow and are based on the needs of having the ability to automate provisioning of HDP clusters on Windows with minimal effort. The automated installer performs the steps below on all hosts identified by the cluster configuration:

  • Enables PowerShell remoting
  • Sets TrustedHosts lists between all Hosts for communication
  • Runs a sequence of checks for host communication, directory creation, etc.
  • Configures and adds Firewall Ports required by Hadoop services
  • Disabled IPV6
  • Installs Java and sets environmental variables
  • Installs Python and sets environmental variables
  • Installs Visual C++ redistributable (if required)
  • Installs .NET framework (if required)
  • Installs Hortonworks Data Platform
  • Starts all services in the cluster
  • Runs Smoke tests for all installed services

The automated installer is a .NET console application project created with Visual Studio that can be found in github at the following link:  https://github.com/acesir/WindowsHDP

The Prebuilt Package folder contains a prebuilt package that contains everything needed to perform the installation except the HDP MSI file because of its size (1GB).  This github repository also contains the project should you want to add additional functionality based on your requirements or just plain enhancing the experience.

The installer has the ability to install a new cluster, uninstall a cluster or add nodes and services to an existing cluster. For this example we will be installing a brand new cluster.

Before we get started on installing the cluster lets take a look at a few important files located inside the extracted folder before we proceed with the installation.

Installer Configuration

WinHDP.exe.config is the configuration file for the console installer which will allow you to customize the installation and which functions it will perform.

<appSettings>
<!-- Installation Applications and File Locations -->
<add key="Python" value="C:\WinHDP\Files\python-2.7.6.amd64.msi"></add>
<add key="VisualC" value="C:\WinHDP\Files\vcredist_x64.exe"></add>
<add key="DotNetFramework" value="C:\WinHDP\Files\dotNetFx40_Full_setup.exe"></add>
<add key="Java" value="C:\WinHDP\Files\jdk-6u31-windows-x64.exe"></add>
<add key="HDP" value="C:\WinHDP\Files\hdp-2.1.1.0.winpkg.msi"></add>
<add key="ClusterProperties" value="C:\WinHDP\Files\clusterproperties.txt"></add>
<!--Hadoop user password-->
<add key="HadoopPassword" value="YOURPASS"></add>
<!-- Knox master password-->
<add key="KnoxMasterKey" value="YOURKEY"></add>
<!-- Required for Server 2008.. DO NOT REMOVE-->
<add key="Powershell3" value="C:\WinHDP\Files\Windows6.1-KB2506143-x64.msu"></add>
<!-- Installation Directory structure-->
<add key="HDPDir" value="C:\HDP"></add>
<!-- Optional configuration -->
<add key="EnableFirewall" value="False"></add>
<add key="RestartForIPV6" value="False"></add>
<add key="StartServices" value="True"></add>
<add key="RunSmokeTests" value="True"></add>
</appSettings>

  • Python – Python msi
  • VisualC –Visual C++ Redistributable executable
  • DotNEtFramework –  .NET Framework executable
  • Java –JDK executable (1.6 or 1.7)
  • HDP – HDP msi location
  • HadoopPassword – password used for the Hadoop user. Keep in mind this has to abide by the password policy on the Server.
  • KnoxMasterKey – master key for the Knox gateway
  • ClusterProperties – cluster configuration file used to define the cluster and services
  • HDPDir – directory used for installing HDP.
  • EnableFirewall – enables firewall on all cluster nodes after installation
  • RestartForIPV6 – after disabling IPV6 causes the nodes to restart in order to resolve to IPV4. (if installing from one of the cluster nodes leave this as False)
  • StartServices – starts all Hadoop services after installation
  • RunSmokeTests – runs smoke tests for all components in HDP (Pig, Hive, MapRedue, etc..)

The entire configuration file can be left untouched by default if you extract the Prebuilt Package to the C:\ directory and run from there.

win1

Cluster Properties

Next important file we need to look at is the cluster properties file located in the Files directory of the extracted package. This file defines the Hadoop cluster master and worker nodes as well as all services installed on the cluster.

#Log directory
HDP_LOG_DIR=c:\hadoop\logs
 
#Data directory
HDP_DATA_DIR=c:\hdp\data
 
#hosts
NAMENODE_HOST=WINNODE3
SECONDARY_NAMENODE_HOST=WINNODE4
RESOURCEMANAGER_HOST=WINNODE3
HIVE_SERVER_HOST=WINNODE3
OOZIE_SERVER_HOST=WINNODE4
WEBHCAT_HOST=WINNODE3
SLAVE_HOSTS=WINNODE3,WINNODE4
CLIENT_HOSTS=WINNODE3,WINNODE4
HBASE_MASTER=WINNODE4
HBASE_REGIONSERVERS=WINNODE3,WINNODE4
ZOOKEEPER_HOSTS=WINNODE3,WINNODE4
FLUME_HOSTS=WINNODE3,WINNODE4
FALCON_HOST=WINNODE3
STORM_NIMBUS=WINNODE3
STORM_SUPERVISORS=WINNODE4
KNOX_HOST=WINNODE4
IS_TEZ=yes
IS_PHOENIX=yes
 
#Database host
DB_FLAVOR=DERBY
DB_HOSTNAME=WINNODE3
DB_PORT=1527
 
#Hive properties
HIVE_DB_NAME=hive
HIVE_DB_USERNAME=hive
HIVE_DB_PASSWORD=YOURPASS
 
#Oozie properties
OOZIE_DB_NAME=oozie
OOZIE_DB_USERNAME=oozie
OOZIE_DB_PASSWORD=YOURPASS

Use the provided sample cluster properties file for building out your cluster.  The sample file contains 2 nodes but you can change this accordingly. You do not necessarily have to install all components but it may be easier to just install everything and disable certain services. HBase is an example of this since it utilizes a lot of memory in the cluster. Also keep in mind that you might want to change the Data Node directory to a drive other than the OS installation if you are deploying this to production or development environments.

One thing to note is that HA (High Availability for the Name Node) is not included in this example. The installer is capable of installing HA as well just not enabling it in an automated fashion because there are manual HDFS commands that need to be run so we will be showing this in the next blog. 

OS Requirements

If installing on Server 2012 there are no software requirements to be installed manually prior to proceeding to run the installer. If installing on Server 2008 you need to install .NET Framework 4.0 and PowerShell 3.0. Both of these are located inside the Files directory in the Prebuilt Package.

  • .NET Framework 4.0 – dotNetFx40_Full_setup.exe
  • PowerShell 3.0 – Windows6.1-KB2506143-x64.msu

Installation Steps

Now we can finally start the installation. Make sure you are running the installer with an account that has Administrative privileges on all nodes targeted for the install and follow the steps below in the correct order to install HDP on Windows:

  1. Download the project from here https://github.com/acesir/WindowsHDP and extract the PrebuiltPackage\WinHDP
  2. You will have to download HDP 2.1 for Windows (you can drop it in Files directory of the above or anywhere else and update the configuration file accordingly)
  3. Disable Firewall on all Nodes (this includes Local and Domain level)
  4. For server 2008 make sure to install .NET Framework 4.0 (dotNetFx40_Full_Setup) and PowerShell 3.0 (Windows6.1-KB2506143) on all Nodes (These files can be found in the Files directory outlined above)
  5. Open WinHDP.exe.conf file and populate\update required fields such as HDPDir, Hadoop password and File locations\ names
  6. Edit the cluster properties file with your target install cluster information
  7. Run installer with Administrator account that has Admin level permissions on all Nodes in the cluster
  8. Choose Install option

When the installation completes successfully you will find hyperlings to namenode status page, YARN status page, HBase Master status (if HBase was installed) and shortcut to Hadoop FS shell.

win2

If certain services did not start after installation you can navigate to service page (Ctrl+R: services.msc) and start/restart them accordingly. You will also notice that all the service have been started under the Hadoop username. Keep this in mind if you run into any issues when running Hadoop FS commands and you get permission issues. Run the Hadoop shell as Hadoop user and change permissions accordingly.

win3

Adding Nodes

You can also choose to add nodes with specific services to an existing cluster using this console application. The one requirement for doing this is to make sure you are using up to date cluster properties file for the currently deployed cluster.

To add nodes follow the steps below:

  • Choose 3. Add Node
  • Add nodes and services in the following format: $servicename=nodename1,nodename2|$servicename=nodename1
  • $servicename needs to be the service name from the clusterproperties.txt file
  • Node list should be comma “,” delimited and service list pipe “|” delimited
  • Example: $SLAVE_HOSTS=WINNODE6,WINNODE7,WINNODE8|$FLUME_HOSTS=WINNODE6,WINNODE7|$HBASE_REGIONSERVERS=WINNODE8
  • Finally, hit enter and wait for the installation to finish

Troubleshooting

If you run into issues there are multiple logs you can review based on errors you encountered. If installer is failing for any reason there will be a log called WinHDP.log in the console executable directory where you launched the installation. This log provides all the runtime logging information from the installation.

For issues with Hadoop services, services not starting correctly or any other issues related to Hadoop review the log called HDPInstall located in the HDP Root directory\WinHDP\Logs

win4

You should also review Hadoop service logs located in Hadoop\logs that contains logs for each installed service in the cluster per individual node.

win5

When you do find the issue related to the installation failure make sure to start the installer again and run the Uninstall option which will clean out all the installed applications, HDP cluster as well as the environmental variables. This will leave the  cluster in a fresh state ready for the next install and must be done if the previous installation encountered errors.

Categorized by :
Administrator CIO & ITDM New Features

Comments

Karthik
|
August 20, 2014 at 5:23 am
|

I have tried installing HDP2.1.3.0 on windows 2008 R2 server, I received a message that HDP installation completed but could not find any hadoop services and no log files created. Please help me in this regard

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Integrate with existing systems
Hortonworks maintains and works with an extensive partner ecosystem from broad enterprise platform vendors to specialized solutions and systems integrators.
Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.
HDP 2.1 Webinar Series
Join us for a series of talks on some of the new enterprise functionality available in HDP 2.1 including data governance, security, operations and data access :