Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
December 18, 2013
prev slideNext slide

How To Use Local Repositories with Apache Ambari

The network and security teams at your company do not allow internet access from the machines where you plan to install Hadoop. What do you do? How do you install your Hadoop cluster without having access to the public software packages? Apache Ambari supports local repositories and in this post we’ll look at the configuration needed for that support.

When installing Hadoop with Ambari, there are three repositories at play: one for Ambari – which primarily hosts the Ambari Server and Ambari Agent packages) and two repositories for the Hortonworks Data Platform – which hosts the HDP Hadoop Stack packages and other related utilities.

General Steps for Building a Local Repository

Whether it’s the Ambari repository, or the HDP repositories, below we summarize two options to build a local repository. For more background, you can review this Hortonworks document that covers installing Hadoop in data centers with network restrictions. The document contains a good amount of details on building local repositories, as well as information regarding where to get the Ambari and HDP repository tarballs (if you choose Option 2 below).

  • Option 1: If you can get temporary internet access, you can use the public repository to build the local repository via “reposync”. Basically, you can “reposync” the packages – which means sync all of the software packages from the public repository to your local host, construct the repository by using linux tools to create the necessary repodata and host all of those packages from your apache web server to have a local repo.

    1. Reposync the repository packages local
    2. Construct the repository repodata for those local packages
    3. Host from apache web server
  • Option 2: If you cannot get temporary internet access, you can download a repository tarball which contains all of the software packages in tarball form, extract into your apache web server for hosting and voila, you have a local repo.

    1. Download repository tarball local
    2. Extract software packages
    3. Host from apache web server

Regardless of your choice above, the end result is having a local repository inside of your network that is addressable by a Base URL – a URL to the directory where the repodata directory of the repository is located.

What about the JDK?

During Ambari Server setup, Ambari will optionally download and install the JDK. The JDK is hosted publicly but if you do not have internet access, you need to download the JDK and install the JDK on your hosts. And when you run Ambari Server setup, specify the -j option to indicate the location of your JDK.

ambari-server setup -j /path/to/your/installed/jdk

Note: This is the JDK install scenario we typically see. Hosts already have a JDK installed and by using the -j option, you instruct Ambari to use that already-installed JDK instead of trying to download and install the JDK from the internet.

Installing HDP Stack with the Local Repository

For Ambari to install the Hortonworks Data Platform (HDP) Stack, you need the HDP repository available. So with the HDP Stack local repository Base URL in hand (that you created earlier), and with the Ambari Server installed + setup, start the Ambari Cluster Install Wizard.


Login and on the Select Stack wizard screen, there is an area for Advanced Repository Options.


Expand the Options area and you’ll see (by default) the Base URLs for the HDP Stack public repositories. Since HDP supports multiple operating systems (OS), and each set of OS packages are in their own repositories, there is a Base URL per OS.


Based on what OS (or OSes) you plan to use in your cluster, replace the public Base URL with your local repository Base URL (that you created earlier). You can uncheck the OSes you do not plan to use in your cluster. Click Next and continue along with the cluster install process.

Ambari will validate your Ambari Server host can reach this repository and that the Base URL points to a valid repository (just in case you mistyped or misconfigured your local repository). And during host registration, if any of the hosts you plan to include in the cluster use a different OS than one(s) you specified in Advanced Repository Options, you will see a warning.

After validation, click Next and continue with your install. After you click Deploy and Ambari installs the Hadoop packages on your hosts, each host will access the local repository to obtain the packages and not go out to the internet.

That’s about it. I should point out that local repositories are not only for installing Hadoop without internet access. Local repositories can also minimize internet bandwidth usage when downloading software packages which helps make cluster installs faster. Also, by having a local repository available for a specific Stack version, you can rest assured you have software packages for a Hadoop Stack available for installs in the future. I think you’ll agree that local repositories are critical when you do not have internet access and will come in handy to help speed package installs.

Get started today using the latest Ambari release. And as always, to find out more about Ambari, please visit the Apache Ambari Project page. You can also join the Ambari User Group and attend Meetup events.




Blair ELzinga says:
Your comment is awaiting moderation.

I’ve tried Ambari a couple of times and every time I get stuck on the validation of the public repository URL – for example the default value is “”.

The validation fails but there is no log entry in ambari-server.log, and I can’t find any information on what tool the validation is using to debug why it is failing. The ambari host has internet access: wget, curl, yum all work fine. What is ambari using to do the validation (or give me a hint which code to look at).

venkat says:

We set up cluster with public repo. Now we have local repo available. We want to replace public repo url with local repo urls. We can’t rebuild cluster. Th hats not an option.
Please suggest how can we do that?

Rahul says:

Getting below error

Creating target directory…

Command start time 2015-09-23 10:40:37
Traceback: Traceback (most recent call last):
File “/usr/lib/python2.6/site-packages/ambari_server/”, line 218, in try_to_execute
retcode = action()
File “/usr/lib/python2.6/site-packages/ambari_server/”, line 655, in createTargetDir
retcode =
File “/usr/lib/python2.6/site-packages/ambari_server/”, line 138, in run
File “/usr/lib64/python2.6/”, line 642, in __init__
errread, errwrite)
File “/usr/lib64/python2.6/”, line 1234, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
ERROR: Bootstrap of host fails because previous action finished with non-zero exit code (177)
ERROR MESSAGE: Execute of ‘<bound method BootstrapDefault.createTargetDir of >’ failed
STDOUT: Try to execute ‘<bound method BootstrapDefault.createTargetDir of >’

Rahul says:

It was due to ssh command not installed on host that was being bootstrapped.

Sadly log does not throw more elaborate error.

Also file /etc/yum.repos.d/ambari.repo is required on the server being bootstrapped.

anil says:

Am trying to install 3 node hortonworks cluster in Ubuntu 14.04 lts how to install in luster level is there any documents, if any documents available please send me the links or documents to my mail id

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums