- PREPARE: Confirm your system meets the Requirements, you have Planned and Prepared for your cluster and performed the Pre-Flight Operations Checklist.
- DOWNLOAD: Download and Install the software.
- PROVISION: Use the wizard to Provision Your Cluster.
- LEARN: Browse Additional Resources to learn more.
The Hortonworks Data Platform (HDP) includes two major components:
- Hortonworks Management Center (HMC) is web-based tool for provisioning, manage and monitoring your Hadoop cluster based on the Apache Ambari project. This software will be installed on a single HMC Server and be the primary management point for your cluster.
- Hadoop Components includes the most popular Apache Hadoop ecosystem projects such as Core (HDFS + MapReduce), Pig, Hive, HBase, Zookeeper, Sqoop, Oozie and HCatalog. This software will be installed on your target cluster hosts and will be managed by HMC.
|Hortonworks Management Center (HMC) (1)||Operating Systems
RHEL / CentOS 5 (64-bit)
RHEL / CentOS 6 (64-bit)Web Browsers
Mozilla Firefox – Latest stable version (12 or later)
|Hadoop Components||Operating Systems (2)
RHEL / CentOS 5 (64-bit)
RHEL / CentOS 6 (64-bit)
- The Hortonworks Management Center runs on the HMC Server.
- To install the Hadoop Components on different Operating Systems (such as SUSE Linux Enterprise Server), use gsInstaller.
|Single Node Install||HMC Server is running on the same machine as you run the Hadoop Component services. This type of install is appropriate for initial evaluation only.|
|Multi-Node Cluster Install||HMC Server is running on a dedicated host and the Hadoop Component services are running on a cluster (i.e. one or more hosts). In general, we recommend at least three nodes in the Hadoop cluster (one master and two slaves).|
In order to prepare your cluster for HDP, you will need to perform steps on each host that will be part of your cluster, as well as prepare the entire cluster to accept installation of the HDP software. This section provides information on those configurations.
Perform the following steps on the HMC Server and each host you plan to include as part of your cluster.
- Confirm the Fully Qualified Domain Name (FQDN) for each host using the command
If deploying your cluster to Amazon EC2, be sure to use the Private DNS host name.
- Confirm each host has Internet access via HTTP, HTTPS and FTP. When performing the HDP install, each host in the cluster will access the Internet to obtain software packages required for installation.
If your hosts will use a proxy to access the Internet, configure each host machine to use an Internet proxy. Check with your IT or network team for these settings.
If you do not have Internet access available to your cluster hosts refer to the Hortonworks Documentation on how to setup a Local Mirror Repository.
- Remove or disable any existing Puppet agent configurations. HDP performs the software installation (and ongoing cluster management) using Puppet. With HDP, the HMC Server is the Puppet master and each host in your cluster acts as a Puppet Agent.
- Disable SELinux
- Enable NTP on the cluster to synchronize the clocks across the hosts.
- Prepare Password-less SSH Login for
rootuser between the HMC Server and each host in the cluster. This enables the HMC Server to reach each host in the cluster via SSH without prompting for a password.
Password-less SSH Login is required for the HMC Server to access each host in the cluster and install the necessary software components. For more information, please refer to the Hortonworks Documentation.
Confirm the HMC Server can SSH to itself without prompting for a password. This can be done using the
- Check the dependencies on each host in the cluster using the
yum info [dependency]command. Confirm the following are either not installed, or if installed, they are these versions.
Name Dependency Version-Release Ruby
2.7.9-2 Ruby Rack
1.1.0-2.el5 Ruby Passenger
3.0.12-1.el5.centos Nagios Plugins
1.4.15-2.el5 Nagios Common
Complete the Preparing Your Cluster steps above and confirm you have the following handy before you install and start HMC:
|SSH Private Key||Obtain the SSH Private Key (typically
|Host names text file||Create a text file of host names that will be part of your cluster. This file should contain a list of target host names, separated by newline, for the cluster. Refer to the Preparing Your Clustersection for more information on obtaining the hostname for each host in your cluster.
The host name should be the FQDN for the host, not the IP address. For more information, refer to the Hortonworks Documentation.
- On the server you plan to use to host HMC, download and install the appropriate HDP RPM based on your HMC Server platform.
HMC Server Platform HDP RPM RHEL / CentOS 5 (64-bit) rpm -Uvh http://public-repo-1.hortonworks.com/HDP-188.8.131.52/repos/centos5/hdp-release-184.108.40.206-1.el5.noarch.rpm RHEL / CentOS 6 (64-bit) rpm -Uvh http://public-repo-1.hortonworks.com/HDP-220.127.116.11/repos/centos6/hdp-release-18.104.22.168-1.el6.noarch.rpm
- Install Extra Packages for Enterprise Linux (EPEL) with the following:
yum install epel-release
- ** RHEL / CentOS 5 HMC Server installs only **. Install the “PHP Extension Community Library for JSON” with the following:
yum install php-pecl-json
- Install HMC using the following:
yum install hmc
- Confirm HMC is installed by querying
hmcfrom the RPM list with the following:
rpm -qa | grep hmc
- Start the HMC service. You will be prompted to agree to the Oracle Java License and download the binaries:
service hmc start
- Stop the firewall with the following:
- Proceed to Provisioning Your Cluster.
- Be sure you have performed steps in the Pre-Flight Operations Checklist and confirm you have your Host names text file and HMC Server SSH Private Key file handy.
- Browse to HMC start page:
- Click the “Get Started” button.
- Follow the wizard instructions to provision your cluster.
Learn more about our products, Hadoop and participate in the community with the following resources:
|Talend Open Studio for Big Data|
Talend Open Studio for Big Data is a powerful and versatile open source data integration tool. Talend provides data managers, operators, and analysts a graphical tool that abstracts the underlying Hadoop complexities and dramatically improves the efficiency of job design through an easy-to-use Eclipse development environment.
|Download Talend Open Studio for Big Data (tar.gz) » Documentation »|