cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
February 04, 2014
prev slideNext slide

Deploying a Hadoop Cluster on Amazon EC2 with HDP2

In this post, we’ll walk through the process of deploying an Apache Hadoop 2 cluster on the EC2 cloud service offered by Amazon Web Services (AWS), using Hortonworks Data Platform.

Both EC2 and HDP offer many knobs and buttons to cater to your specific, performance, security, cost, data size, data protection and other requirements. I will not discuss most of these options in this blog as the goal is to walk through one particular path of deployment to get started.

Let’s go!

Prerequisites

  • Amazon Web Services account with the ability to launch 7 large instances of EC2 nodes.
  • A Mac or a Linux machine. You could also use Windows but you will have to install additional software such as SSH clients and SCP clients, etc.
  • Lastly, we assume that you have basic familiarity with EC2 to the extent that you have created EC2 instances and SSH’d in.

Step 1: Creating a Base AMI with all the OS level configuration common to all nodes

Navigate to your EC2 console from the AWS Dashboard and then click on ‘Launch Instance’:
EC2 Dashboard

Let’s select the RHEL 64bit and go to the next step:

SelectBaseImage

Let’s select a large instance with adequate processing power and memory:

ImageSize

Here we adjust storage as required:

Instance_Storage

We are ready for Review and Launch:

ReviewAndLaunch

But, before you Launch the instance, make sure you have downloaded the private key. Keep the private key safe and Launch:

PrivateKey

Everything looks good. Let’s view the instances.

<Display Name>

Now that we have instance up and running, we will need the public DNS name to connect to it:

<Display Name>

Let’s SSH in:

SSH

Now let’s prep the instance:

Prep

That was all the prep we need, so we are going to create a private AMI. Go to the EC2 console, select the instance and from the action menu select “Create Image”:

AMI

Make sure you check ‘No reboot’ before you click Create Image, as we will like to continue to work on this instance:

NoReboot

Wait for the creation of the AMI to be complete:

<Display Name>

Let’s configure this instance for password-less SSH to all the other nodes in the cluster. The first step is to have the private key on this instance.

<Display Name>

We will need to move the private key to .ssh folder and rename it to id_rsa:

<Display Name>

Let’s provision the other nodes now:

<Display Name>

Select the size of the node instances:

<Display Name>

I will select 6 more nodes here with 3 nodes dedicated for all the management daemons and 4 nodes dedicated to data nodes. Then click on ‘Review and Launch’:

<Display Name>

Click on the “Launch” button:

<Display Name>

Ensure, you are using the same key as before for the passwordless SSH to work between the Ambari node and the rst of the new nodes. Click on the ‘Launch Instance’:

<Display Name>

As the instances are getting launched, we will copy down to a text file the Private DNS names of all the instances we have launched so far:

<Display Name>

We will end up with a list like below:

<Display Name>

Step 2: Customize the security groups to minimize attack surface area while not blocking essential communication channels

We have have to add rules to the security groups which was created by default when we launched the instances.

The first security group should have been created when we launched the first instance. We are running the Ambari server on this instance, so we have to ensure we can get to it and it can communicate with the rest of the instances that we launched later:

<Display Name>

Then we also need to open up the ports for IPs internal to the datacenter:

<Display Name>

Step 3: Setting up Ambari

Get the bits of HDP and add it to the repo:

<Display Name>

next we will refresh the repo:

<Display Name>

Then we will install the Ambari server:

<Display Name>

Agree to download bits:

<Display Name>

Agree to download the key:

<Display Name>

Ambari Server bits are installed:

<Display Name>

Now, we will configure the bits:

<Display Name>

Just accept all the all the default options for all the prompts by pressing Enter:

<Display Name>

Let’s start the Ambari Server:

<Display Name>

That’s it we are all set to use Ambari to bring up the cluster.

Step 4: Using Ambari to deploy the cluster

Copy the public DNS name of the Ambari:

<Display Name>

Navigate to port 8080 of the public DNS from your browser. You should see the login page of Ambari. The default username and password is ‘admin’ and ‘admin’ respectively:

<Display Name>

This is where we start creating the cluster. Enter any cluster name of your choosing:

<Display Name>

We are going to create a HDP 2.0 cluster:

<Display Name>

Remember the list of private DNS names that you had copied down to a text file. We will pull out the list and paste it in the Target host input box. We will also upload the private key that we have been using on this page:

<Display Name>

We are all set to go. These should all come back as green with no warnings:

<Display Name>

At this stage, we need to decide what services we need:

<Display Name>

For this demonstration, I will select everything, although in real life you want to be more judicious and select the bare minimum needed for your requirement:

<Display Name>

After we are done selecting the services, it’s time to determine where they will run. Ambari is smart enough to suggest you reasonable suggestions, but if you have specific topology in mind you might want move these around:

<Display Name>

Next step is to configure which nodes do you want to Data nodes and Clients to be. I like to have clients on multiple instances just for the convenience.

<Display Name>

In the next step we will have to configure the credentials for some of the services. the ones where you will need to populate the credentials are marked by a number in the red background mark:

<Display Name>

Once we are done with all the inputs, we are ready to review and then start the deployment:

<Display Name>

At this point it will take a while ( ~ 30 mins) to complete the deployment and test the services:

<Display Name>

Voila!! We now have a fully functional and tested cluster on EC2. Happy Hadooping!!!

@saptak

Comments

  • Hi,

    Can I get steps to deploy manually (I mean hardway) instead of using Apache Ambari

    Thanks

  • Why hdp2 doen’t support Amazon linux ami? It is derived redhat and not have licence cost

  • Step 4: Using Ambari to deploy the cluster

    The login page for ambari is not opening. I am running the URL with my ambari server public dns:8080/login.

    I foolowed the security step to add 8080 for this server.

    Please help.

  • An excellent introduction to using Ambari on Amazon.
    – Is there an automated script that you can use instead of the manual steps?
    – Is the base machine image created as a publicly accessible AMI?

    Thank you for the well written tutorial.

  • I got the below error message when I invoked yum install ambari-server

    file /usr/lib64/python2.6/zipfile.pyo from install of python26-2.6.8-2.el5.x86_64 conflicts with file from package python-2.6.6-37.el6_4.x86_64

  • I got the below error at: yum install ambari-server

    file /usr/lib64/python2.6/zipfile.pyo from install of python26-2.6.8-2.el5.x86_64 conflicts with file from package python-2.6.6-37.el6_4.x86_64

  • This document seems few elements, will update and provide a link soon.

    A question if someone could answer. I am trying to follow the doc but got stuck at Confirm Hosts – It’s failing and here is the message

    Host checks were skipped on 7 hosts that failed to register.

    Can someone please help me troubleshoot?

  • I just installed HDP via Ambari on AWS. Everything went okay. I am on login prompt, trying to install the cluster now but it’s failing on confirm hosts screen. Here is the message —

    Host checks were skipped on 7 hosts that failed to register.

    Can someone help me troubleshoot? Please please.

  • Is medium instances sufficient ? How many minimum EC2 instances are required for master /slave for a quick set up ?

  • when i do yum install ambari-server , I get the following error.

    Transaction Check Error:
    file /usr/lib64/python2.6/distutils/README from install of python26-2.6.8-2.el5.x86_64 conflicts with file from package python-2.6.6-37.el6_4.x86_64
    file /usr/lib64/python2.6/site-packages/README from install of python26-2.6.8-2.el5.x86_64 conflicts with file from package python-2.6.6-37.el6_4.x86_64
    file /usr/lib64/python2.6/bsddb/__init__.py from install of python26-2.6.8-2.el5.x86_64 conflicts with file from package python-2.6.6-37.el6_4.x86_64
    file /usr/lib64/python2.6/compiler/__init__.py from install of python26-2.6.8-2.el5.x86_64 conflicts with file from package python-2.6.6-37.el6_4.x86_64
    file /usr/lib64/python2.6/ctypes/__init__.py from install of python26-2.6.8-2.el5.x86_64 conflicts with file from package python-2.6.6-37.el6_4.x86_64
    file /usr/lib64/python2.6/ctypes/macholib/__init__.py from install of python26-2.6.8-2.el5.x86_64 conflicts with file from package python-2.6.6-37.el6_4.x86_64
    file /usr/lib64/python2.6/curses/__init__.py from install of python26-2.6.8-2.el5.x86_64 conflicts with file from package python-2.6.6-37.el6_4.x86_64
    file /usr/lib64/python2.6/distutils/__init__.py from install of python26-2.6.8-2.el5.x86_64 conflicts with file from package python-2.6.6-37.el6_4.x86_64

    • This also a complete show stopper for me.
      RHEL has package python installed which apparently is 2.6.
      IMHO easiest solution is to modify the dependencies of Ambari, because replacing the package “python” with “python26” is close to impossible.

  • I had to install Ambari 1.5 instead of 1.4.3.38, then open up TCP on then 2nd security group to all addresses, to get this to work.

  • I also had to edit the Ambari config file and change the web port from 8080 to 80 in Amazon EC2. I would have liked to leave it at the original value but I could not change the port to 8080 in the security group in EC2.

  • Hi Saptak,
    I tried to find the url for HUE but couldn’t, is HUE included in this install, or would it have to be installed separately on top?
    Thanks,
    Gerry

  • hi,

    i follow step by step but there is error when creteing HDP stack.

    STDOUT

    STDERR
    Please login as the user “ec2-user” rather than the user “root”.

    scp /usr/lib/python2.6/site-packages/ambari_server/os_type_check.sh done for host ip-172-31-35-90.us-west-2.compute.internal, exitcode=1
    Copying os type check script finished
    ERROR: Bootstrap of host ip-172-31-35-90.us-west-2.compute.internal fails because previous action finished with non-zero exit code (1)

    can you help me ?

  • Hi

    I’ve run through this tutorial before and it worked fine.
    I’m redoing it now, and I can’t get past the step where I register all the hosts. The wizard gets through most of it until what I think is the end and then I keep getting:

    Setting up agent finished
    Registering with the server…
    Registration with the server failed.

    Is there anywhere to check the logs to see what is happening and why it’s failing?
    I can ssh from the original node to all the other nodes as both ec2-user and root. Is there something else I need to do?

    Regards
    Brian

  • Hi,

    I’m trying to follow this setup in Amazon EC2 + VPC but I get a warning message in the confirm host step:

    All bootstrapped hosts registered but unable to retrieve cpu and memory related information hortonworks

    The problem is that next button is disabled and cannot continue with the installation

  • I got to the beginning of Step 4. But when I try to enter “http://(my public dns):8080”, the browser is unable to connect to it. Any suggestions for how I may fix this?

    Thanks!

  • You should add that for AWS, the default user when adding the Hosts should be set to ec2-user, root will fail.

  • Things went fine till I reached Ambari Installation. Only the Ambari host is started successfully. All the other instances have failed Registration. I see the below error in logs

    Agent log at: /var/log/ambari-agent/ambari-agent.log
    (‘INFO 2014-10-02 13:21:03,954 NetUtil.py:74 – Server at https://myAmbariHostPrivateDNS:8440 is not reachable, sleeping for 10 seconds…
    INFO 2014-10-02 13:21:13,965 NetUtil.py:41 – Connecting to the following url https://myAmbariHostPrivateDNS:8440/cert/ca
    INFO 2014-10-02 13:22:16,966 NetUtil.py:55 – Failed to connect to https://myAmbariHostPrivateDNS:8440/cert/ca due to [Errno 110] Connection timed out

    Anyone faced a similar issue ?

    • Hi,

      I’ve encountered the same problem. I’ve only come across updating openssl as the only viable solution. However, this still doesn’t resolve the problem for me either.

      This is for HDP 2.2 and Ambari 1.7.0. Have you had much luck?

      Thanks!

  • All this blog was a single line of code with the now abandoned whirr and it offered also a distributed script execution capability to install custom software. I don’t think we share the same definition of progress.

  • One big problem with this approach is that as soon as the instances are shut off and turned back on, the IP addresses and hostnames change, which means the cluster doesn’t know where to find its various services. What’s a good way to deal with this by using Elastic IPs or a VPC or something?

  • I keep getting this error.

    Error: Package: ambari-server-1.2.3.7-1.noarch (Updates-ambari-1.2.3.7)
    Requires: python26

    I am on EC2 RHEL image (7.1). I used Centos 6 version of Ambar.

    Tried many of the solutions for this (like installing development tools, clearing repo, install python). Still doesn’t work,

  • Thanyou for the information.
    After performing the above steps on amazon, is there any way where i can use the amazon api to use and restful services for HDP, so that i can access the data in any other web application?
    Can anyone help me on this.?

  • Failed in Registering host:

    I am installing HDP 2.0 using Ambari on AWS EC2. I installed Ambari and able to open the console. But when trying to register host list, its getting failed. I am not able to get logs aswell.

  • The Aws Online Training Features and Concepts track expands the participants’ knowledge on infrastructural and business concepts and functionality of selected modules of the Multichannel Platform. Aim of this course is to make participants understand the features and concepts for the successful planning of projects.

  • Leave a Reply

    Your email address will not be published. Required fields are marked *