Hortonworks Data Platform 2.1 for Windows is the 100% open source data management platform based on Apache Hadoop and available for the Microsoft Windows Server platform. I have built a helper tool that automates the process of deploying a multi-node Hadoop cluster – utilizing the MSI available in HDP 2.1 for Windows.
HDP on Windows installation package comes in the format of MSI, Microsoft’s MSI format utilizes the installation and configuration service provided with Windows called Windows Installer. The installer service enables customers to provide better corporate deployment and provides a standard format for component management. This blog will give you an idea of how Hortonworks Data Platform on Windows and the MSI package it offers can be automated and customized for deployment within you company.
The documentation and pre-requirements for prepping the environment for installation can be found here. The idea behind automating installation with the MSI in your own environment is to follow the documentation steps and automate any requirements based on what Hadoop needs as well as what your company standards are so that you may provision HDP clusters and data nodes in a faster and more controlled manner.
Knowing I have the MSI and with a little bit of PowerShell remoting and .NET it was easy to automate the steps, application requirements installation and settings configurations to automate installing multi node Hadoop clusters with HDP on Windows. The steps this installer performs are bellow and are based on the needs of having the ability to automate provisioning of HDP clusters on Windows with minimal effort. The automated installer performs the steps below on all hosts identified by the cluster configuration:
The automated installer is a .NET console application project created with Visual Studio that can be found in github at the following link: https://github.com/acesir/WindowsHDP
The Prebuilt Package folder contains a prebuilt package that contains everything needed to perform the installation except the HDP MSI file because of its size (1GB). This github repository also contains the project should you want to add additional functionality based on your requirements or just plain enhancing the experience.
The installer has the ability to install a new cluster, uninstall a cluster or add nodes and services to an existing cluster. For this example we will be installing a brand new cluster.
Before we get started on installing the cluster lets take a look at a few important files located inside the extracted folder before we proceed with the installation.
WinHDP.exe.config is the configuration file for the console installer which will allow you to customize the installation and which functions it will perform.
<appSettings> <!-- Installation Applications and File Locations --> <add key="Python" value="C:WinHDPFilespython-2.7.6.amd64.msi"></add> <add key="VisualC" value="C:WinHDPFilesvcredist_x64.exe"></add> <add key="DotNetFramework" value="C:WinHDPFilesdotNetFx40_Full_setup.exe"></add> <add key="Java" value="C:WinHDPFilesjdk-6u31-windows-x64.exe"></add> <add key="HDP" value="C:WinHDPFileshdp-22.214.171.124.winpkg.msi"></add> <add key="ClusterProperties" value="C:WinHDPFilesclusterproperties.txt"></add> <!--Hadoop user password--> <add key="HadoopPassword" value="YOURPASS"></add> <!-- Knox master password--> <add key="KnoxMasterKey" value="YOURKEY"></add> <!-- Required for Server 2008.. DO NOT REMOVE--> <add key="Powershell3" value="C:WinHDPFilesWindows6.1-KB2506143-x64.msu"></add> <!-- Installation Directory structure--> <add key="HDPDir" value="C:HDP"></add> <!-- Optional configuration --> <add key="EnableFirewall" value="False"></add> <add key="RestartForIPV6" value="False"></add> <add key="StartServices" value="True"></add> <add key="RunSmokeTests" value="True"></add> </appSettings>
The entire configuration file can be left untouched by default if you extract the Prebuilt Package to the C: directory and run from there.
Next important file we need to look at is the cluster properties file located in the Files directory of the extracted package. This file defines the Hadoop cluster master and worker nodes as well as all services installed on the cluster.
#Log directory HDP_LOG_DIR=c:hadooplogs #Data directory HDP_DATA_DIR=c:hdpdata #hosts NAMENODE_HOST=WINNODE3 SECONDARY_NAMENODE_HOST=WINNODE4 RESOURCEMANAGER_HOST=WINNODE3 HIVE_SERVER_HOST=WINNODE3 OOZIE_SERVER_HOST=WINNODE4 WEBHCAT_HOST=WINNODE3 SLAVE_HOSTS=WINNODE3,WINNODE4 CLIENT_HOSTS=WINNODE3,WINNODE4 HBASE_MASTER=WINNODE4 HBASE_REGIONSERVERS=WINNODE3,WINNODE4 ZOOKEEPER_HOSTS=WINNODE3,WINNODE4 FLUME_HOSTS=WINNODE3,WINNODE4 FALCON_HOST=WINNODE3 STORM_NIMBUS=WINNODE3 STORM_SUPERVISORS=WINNODE4 KNOX_HOST=WINNODE4 IS_TEZ=yes IS_PHOENIX=yes #Database host DB_FLAVOR=DERBY DB_HOSTNAME=WINNODE3 DB_PORT=1527 #Hive properties HIVE_DB_NAME=hive HIVE_DB_USERNAME=hive HIVE_DB_PASSWORD=YOURPASS #Oozie properties OOZIE_DB_NAME=oozie OOZIE_DB_USERNAME=oozie OOZIE_DB_PASSWORD=YOURPASS
Use the provided sample cluster properties file for building out your cluster. The sample file contains 2 nodes but you can change this accordingly. You do not necessarily have to install all components but it may be easier to just install everything and disable certain services. HBase is an example of this since it utilizes a lot of memory in the cluster. Also keep in mind that you might want to change the Data Node directory to a drive other than the OS installation if you are deploying this to production or development environments.
One thing to note is that HA (High Availability for the Name Node) is not included in this example. The installer is capable of installing HA as well just not enabling it in an automated fashion because there are manual HDFS commands that need to be run so we will be showing this in the next blog.
If installing on Server 2012 there are no software requirements to be installed manually prior to proceeding to run the installer. If installing on Server 2008 you need to install .NET Framework 4.0 and PowerShell 3.0. Both of these are located inside the Files directory in the Prebuilt Package.
Now we can finally start the installation. Make sure you are running the installer with an account that has Administrative privileges on all nodes targeted for the install and follow the steps below in the correct order to install HDP on Windows:
When the installation completes successfully you will find hyperlings to namenode status page, YARN status page, HBase Master status (if HBase was installed) and shortcut to Hadoop FS shell.
If certain services did not start after installation you can navigate to service page (Ctrl+R: services.msc) and start/restart them accordingly. You will also notice that all the service have been started under the Hadoop username. Keep this in mind if you run into any issues when running Hadoop FS commands and you get permission issues. Run the Hadoop shell as Hadoop user and change permissions accordingly.
You can also choose to add nodes with specific services to an existing cluster using this console application. The one requirement for doing this is to make sure you are using up to date cluster properties file for the currently deployed cluster.
To add nodes follow the steps below:
$servicenameneeds to be the service name from the clusterproperties.txt file
If you run into issues there are multiple logs you can review based on errors you encountered. If installer is failing for any reason there will be a log called WinHDP.log in the console executable directory where you launched the installation. This log provides all the runtime logging information from the installation.
For issues with Hadoop services, services not starting correctly or any other issues related to Hadoop review the log called HDPInstall located in the HDP Root directoryWinHDPLogs
You should also review Hadoop service logs located in Hadooplogs that contains logs for each installed service in the cluster per individual node.
When you do find the issue related to the installation failure make sure to start the installer again and run the Uninstall option which will clean out all the installed applications, HDP cluster as well as the environmental variables. This will leave the cluster in a fresh state ready for the next install and must be done if the previous installation encountered errors.