Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
September 08, 2011
prev slideNext slide

Set Up Apache Hadoop in Minutes with RPMs

We have some great news for developers and researchers that want to start using Apache Hadoop quickly. With the release of Apache Hadoop 0.20.204 today comes, for the first time, availability of RPMs that make it much simpler to setup a basic Hadoop cluster. This will allow you to focus on how to use the features instead of having to learn how they were implemented.

Before we begin, I’d like to apologize for the fact that these instructions do not optimize Hadoop settings to make Hadoop fast. We will leave Hadoop optimization for another day.

Download software

Download Java JDK RPM.

Download Apache Hadoop RPM from Apache mirrors.

Single node system setup

1) Install JDK on a Red Hat or CentOS 5+ system.

sudo ./

Java is installed and set JAVA_HOME to /usr/java/default

2) Install Apache Hadoop 0.20.204.

sudo rpm -i hadoop-

3) Setup Apache Hadoop configuration and start Hadoop processes.

sudo /usr/sbin/

The setup wizard will guide you through a list of questions to setup hadoop. Hadoop should be running after answering ‘Y’ to all questions.

4) Create a user account on HDFS for yourself.

sudo /usr/sbin/ -u $USER

Multi-nodes setup

1) install both the JDK and Hadoop RPMs on all nodes

2) Generate hadoop configuration on all nodes:

sudo /usr/sbin/  

Where ${namenode} and ${jobtracker} should be replaced with hostname of namenode and jobtracker.

3) Format namenode and setup default HDFS layout.

sudo /usr/sbin/

4) Start all data nodes.

sudo /etc/init.d/hadoop-datanode start

5) Start job tracker node.

sudo /etc/init.d/hadoop-jobtracker start

6) Start task tracker nodes.

sudo /etc/init.d/hadoop-tasktracker start

7) Create a user account on HDFS for yourself.

sudo /usr/sbin/ -u $USER

Verify Hadoop

Run the word count example.

I hope this information is helpful. For questions about Hadoop RPMs, please contact me directly at eyang at hortonworks dot com.

— Eric Yang



Jagane Sundar says:

Hello Eric,

The 205 release has broken a few things in your blog post above. I installed the 205 rpm, and then ran the script as described in your blog post above.
Issue #1: The param –namenode-url has changed to –namenode-host, and the –jobtracker-url has changed to –jobtracker-host.
Issue#2: The rpm creates a linux user mapred, whereas the default user is “mr” in your script. The following additional param –mapreduce-user=mapred needs to be added in order to make the script play well with the rpm.

Hmm. OK. I actually cannot get the non-secure version to work when I create a configuration using this script. Oh well…

Jagane Sundar says:

OK. Ignore the last comment I made about not being able to get it to run. I did get 205 to run using the rpm install, and the instructions in your blog post above.

I had to do two more things:

Issue #3: After running the script /usr/sbin/, I needed to logout and login again because the env variables set in /etc/profile.d/ were not sourced in my shell, so /usr/sbin/ failing.

Issue #4: I need to add the new parameter –format to /usr/sbin/ in order to format HDFS.

Subsequently, the JT and TTs started up without any problems.

For reference, here is the command line I used for

# /usr/sbin/

Cheers, Eric. It would be great if you can keep this blog post and the rpm/scripts current and working. This is the easiest way to get a hadoop up and running.

gopal says:


While creating hdfs user i am getting error

JAVA_HOME is not set.

But when i tried echo $JAVA_HOME i can retrieve the path. And also i included the JAVA_HOME value in the file /usr/sbin/

Any idea ?


Eric Yang says:

Hi Gopal,

Hdfs user creation could fail if there is already an existing user using the same uid. Please check that your system does not already have a uid for HDFS user. In addition, JAVA_HOME may not be exported to the child process when the scripts are spawning additional shell. Hence, it is best to explicitly set –java-home settings in the setup script.

Hope this helps.


hadoop-user59 says:

Where does rpm install the conf directory? I need to change the core-site.xml but can’t find its location for the Amazon AMI linux instance.


hadoop-user59 says:

Okay. I found it on /etc/hadoop/core-site.xml. I changed the port from 8020 to 9000. How do I re-start hadoop?


Eric Yang says:

Hi hadoop-user59,

Use /etc/init.d/hadoop-* scripts. For example, to start datanode:

sudo /etc/init.d/hadoop-datanode start


David Tucker says:

There are a few other problems with the script. Most notably, the template *-site.xml files appear to be insufficiently aligned with the environment settings from the script. For example, the mapred-site.xml template has to separate .dir entries that are hard-coded to “/mapred/” rather than “$HADOOP_MAPRED_DIR/” . The result is that attempts to start the Hadoop services fail as directories cannot be created.

It’s possible that this has been fixed in later revisions. I was using the CDH4 tarball of hadoop-2.0 from Cloudera.

Chris says:

Hadummy posting here (first time installer). I after running the hadoop-setup-conf in the initial blog, I received a pretty large chunk of errors. I ran Jagane Sundar script instead, and got less errors, but still got hit with “chown: invalid user: `mr:hadoop\'”. I’ve Googled, Binged and Yahoo’d to no avail. Apparently I’m not the first person to get this error, but no one has a method for how to get around it. (For the record, this error also appears on the intial script in the blog, it’s just followed with a more verbose string of issues.)

I’ve executed this via SUDO and as root, the errors, and the result are the same. (Three of the aboive mentioned errors, followed with “configuration setup is completed run”. fails with permission errors in the log and run directories)

I’m on CentOS 6.3, using hadoop-1.1.1-1.x86_64.rpm as my install package.

Eric Yang says:

David, the script works on stock Apache Hadoop 1.x only. Cloudera has their own instructions on install CDH4 rpm.

Hadummy, the running system does not have mr user in hadoop group. The script is designed to run as root only to setup Linux task controller properly.

Kent Brodie says:

After a whole ton of googling, I *finally* came across this blog entry and wow- wish this was included with the rpm kit(s). I downloaded and installed 1.0.4 stable, and had spent a day or so configuring things manually before I discovered the setup scripts that were included (oops!)– but still need something like this posting to figure out how to do things. VERY helpful. (yes, since 1.0.4 a few changes were required per the above replies, nothing major).


vennela says:

at step 3 of multi node set up i’m gettting “java_home not set” error… but i have set jave_home.. when i type “echo $JAVA_HOME” its giving me the path to jdk.. plz hel me with this

radhika says:


sudo -u hdfs hadoop fs -mkdir /var

when am tring to execute the above command it says ,JAVA_HOME is not set …can u plz help me ..

and i exported the follwing two lines in all the scripts ..

export JAVA_HOME=/usr/java/jdk1.7.0_21
export PATH=$PATH:$JAVA_HOME/bin


and echo $JAVA_HOME is retriving the correct path ..

can anyone plz help me ..
thanks ,

Srikanth says:
Your comment is awaiting moderation.

Good one, similar article with more detailed information could be found at

Syed Muhammad Shoaib says:
Your comment is awaiting moderation.

Replace export JAVA_HOME with export JAVA_HOME=${JAVA_HOME} and the error will be removed. It worked for me.

Prem says:

Hi Eric,

Everything went well. Except step 7 to create an user account.

sudo /usr/sbin/ -u $USER

The message I get is

mkdir: failed to create /user/hadoopuser
chown: could not get status for ‘/user/hadoopuser’ : File /user/hadoopuser does not exist.

Any suggestions?

Also, once I create an account how I access hadoop from Centos terminal?

Thank you so much

Nash says:

How long does it take to install Hadoop ? Please give me a ballpark.

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums