Download & Install: Hortonworks Data Platform 1.0


Hortonworks Data Platform is designed to be installed by Operators and IT Administrators using Linux-friendly installation tools. The following instructions contain information to help prepare your cluster and install the software. These quick start instructions are here to help you get started. We recommend you review this information thoroughly prior to performing an installation. And for complete installation instructions, please refer to the Hortonworks Documentation.

System Requirements

The Hortonworks Data Platform (HDP) includes two major components:

  • Hortonworks Management Center (HMC) is web-based tool for provisioning, manage and monitoring your Hadoop cluster based on the Apache Ambari project. This software will be installed on a single HMC Server and be the primary management point for your cluster.
  • Hadoop Components includes the most popular Apache Hadoop ecosystem projects such as Core (HDFS + MapReduce), Pig, Hive, HBase, Zookeeper, Sqoop, Oozie and HCatalog. This software will be installed on your target cluster hosts and will be managed by HMC.
Component Platforms
Hortonworks Management Center (HMC) (1) Operating Systems
RHEL / CentOS 5 (64-bit)
RHEL / CentOS 6 (64-bit)Web Browsers
Mozilla Firefox – Latest stable version (12 or later)
Hadoop Components Operating Systems (2)
RHEL / CentOS 5 (64-bit)
RHEL / CentOS 6 (64-bit)
  1. The Hortonworks Management Center runs on the HMC Server.
  2. To install the Hadoop Components on different Operating Systems (such as SUSE Linux Enterprise Server), use gsInstaller.
Planning Your Installation
Install Type Description
Single Node Install HMC Server is running on the same machine as you run the Hadoop Component services. This type of install is appropriate for initial evaluation only.
Multi-Node Cluster Install HMC Server is running on a dedicated host and the Hadoop Component services are running on a cluster (i.e. one or more hosts). In general, we recommend at least three nodes in the Hadoop cluster (one master and two slaves).
Preparing Your Cluster

In order to prepare your cluster for HDP, you will need to perform steps on each host that will be part of your cluster, as well as prepare the entire cluster to accept installation of the HDP software. This section provides information on those configurations.

Perform the following steps on the HMC Server and each host you plan to include as part of your cluster.

  1. Confirm the Fully Qualified Domain Name (FQDN) for each host using the command hostname -f.

    If deploying your cluster to Amazon EC2, be sure to use the Private DNS host name.

  2. Confirm each host has Internet access via HTTP, HTTPS and FTP. When performing the HDP install, each host in the cluster will access the Internet to obtain software packages required for installation.

    If your hosts will use a proxy to access the Internet, configure each host machine to use an Internet proxy. Check with your IT or network team for these settings.

    If you do not have Internet access available to your cluster hosts refer to the Hortonworks Documentation on how to setup a Local Mirror Repository.

  3. Remove or disable any existing Puppet agent configurations. HDP performs the software installation (and ongoing cluster management) using Puppet. With HDP, the HMC Server is the Puppet master and each host in your cluster acts as a Puppet Agent.
  4. Disable SELinux
  5. Enable NTP on the cluster to synchronize the clocks across the hosts.
  6. Prepare Password-less SSH Login for rootuser between the HMC Server and each host in the cluster. This enables the HMC Server to reach each host in the cluster via SSH without prompting for a password.

    Password-less SSH Login is required for the HMC Server to access each host in the cluster and install the necessary software components. For more information, please refer to the Hortonworks Documentation.

    Confirm the HMC Server can SSH to itself without prompting for a password. This can be done using the ssh root@localhost command.

  7. Check the dependencies on each host in the cluster using the yum info [dependency]command. Confirm the following are either not installed, or if installed, they are these versions.
    Name Dependency Version-Release
    Ruby ruby 1.8.5-24.el5
    Puppet puppet 2.7.9-2
    Ruby Rack rubygem-rack 1.1.0-2.el5
    Ruby Passenger rubygem-passenger 3.0.12-1.el5.centos
    Nagios nagios 3.0.12-1.el5.centos
    Nagios Plugins nagios-plugins 1.4.15-2.el5
    Nagios Common nagios-common 2.12-10.el5
    MySQL mysql 5.*
Pre-Flight Operations Checklist

Complete the Preparing Your Cluster steps above and confirm you have the following handy before you install and start HMC:

Check Operation Description
SSH Private Key Obtain the SSH Private Key (typically id_rsa) to use during the installation. Refer to the Hortonworks Documentation for more information on how the Private Key is used during cluster provisioning.
Host names text file Create a text file of host names that will be part of your cluster. This file should contain a list of target host names, separated by newline, for the cluster. Refer to the Preparing Your Clustersection for more information on obtaining the hostname for each host in your cluster.

The host name should be the FQDN for the host, not the IP address. For more information, refer to the Hortonworks Documentation.

Download and Install
  1. On the server you plan to use to host HMC, download and install the appropriate HDP RPM based on your HMC Server platform.
    HMC Server Platform HDP RPM
    RHEL / CentOS 5 (64-bit) rpm -Uvh http://public-repo-1.hortonworks.com/HDP-1.0.1.14/repos/centos5/hdp-release-1.0.1.14-1.el5.noarch.rpm
    RHEL / CentOS 6 (64-bit) rpm -Uvh http://public-repo-1.hortonworks.com/HDP-1.0.1.14/repos/centos6/hdp-release-1.0.1.14-1.el6.noarch.rpm
  2. Install Extra Packages for Enterprise Linux (EPEL) with the following:
    yum install epel-release
  3. ** RHEL / CentOS 5 HMC Server installs only **. Install the “PHP Extension Community Library for JSON” with the following:
    yum install php-pecl-json
  4. Install HMC using the following:
    yum install hmc
  5. Confirm HMC is installed by querying hmcfrom the RPM list with the following:
    rpm -qa | grep hmc
  6. Start the HMC service. You will be prompted to agree to the Oracle Java License and download the binaries:
    service hmc start
  7. Stop the firewall with the following:
    /etc/init.d/iptables stop
  8. Proceed to Provisioning Your Cluster.
Provisioning Your Cluster
  1. Be sure you have performed steps in the Pre-Flight Operations Checklist and confirm you have your Host names text file and HMC Server SSH Private Key file handy.
  2. Browse to HMC start page:
    http://{your.hmc.server}/hmc/html
  3. Click the “Get Started” button.
  4. Follow the wizard instructions to provision your cluster.
Additional Resources

Learn more about our products, Hadoop and participate in the community with the following resources:

Add-Ons
Talend Open Studio for Big Data

Talend Open Studio for Big Data is a powerful and versatile open source data integration tool. Talend provides data managers, operators, and analysts a graphical tool that abstracts the underlying Hadoop complexities and dramatically improves the efficiency of job design through an easy-to-use Eclipse development environment.

Download Talend Open Studio for Big Data (tar.gz) » Documentation »