The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

HDP on Linux – Installation Forum

HDP 1.3 installation from RPMs

  • #30990
    Tunde Balint

    I’ve installed HDP 1.3 on a RHEL cluster :
    – 1 namenode –
    – 1 secondary namenode+jobracker -
    – 4 datanodes/tasktrackers –
    I’ve disabled all the firewalls, set ulimit to 32768 for mapred/hdfs/hadoop users and set the dfs.datanode.max.xcievers to 4096. When I try to copy or retrieve a file from HDFS it works, and the file is replicated (I’ve checked the log of the namenode and fsck). When I try to run a simple MR job (hadoop jar /usr/lib/hadoop/hadoop-examples.jar sleep -m 1 -r 1) I get a lot of warnings:

    13/08/06 18:15:01 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block Bad response 1 for block blk_4230015220173267878_1043 from datanode
    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$

    and sometimes the job starts and finishes after a fairly long time and sometimes it doesn’t even start giving me the following error: 60000 millis timeout while waiting for channel to be ready for read
    In the namenode log I see:

    2013-08-06 18:15:06,243 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: PendingReplicationMonitor timed out block blk_1864448312313307039_1029
    2013-08-06 18:15:06,243 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: PendingReplicationMonitor timed out block blk_5881133419735879626_1034
    2013-08-06 18:15:09,338 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask to replicate blk_5881133419735879626_1034 to datanode(s)

    On the datanodes I see errors like:

    ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(, storageID=DS-1865011095-, infoPort=50075, ipcPort=8010):DataXceiver
    org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException: Block blk_-7183920363775195741_1024 has already been started (though not completed), and thus cannot be created.

    My machines have multiple ethernet interfaces, but since HDFS put/get works I do not think that the network is a problem. When I do a hadoop fsck it says that the filesystem is healthy and shows a few underreplicated blocks in /user/hadoop/.staging/
    I already uninstalled and reinstalled HDP, deleted all the datanode data directories and reformatted the namenode…but it didn’t help.
    Could anybody tell me what I should check or what would fix my problem?

    Kind regards,
    Tunde Balint

  • Author
  • #31181
    Sasha J

    hard to say right away…
    Did you check that time is in sync on all nodes?
    Could you temporarily disable all the extra NICs on your boxes and run your test again?

    It may be related to multiple network interfaces….

    Thank you!

    Tunde Balint

    Hi Sasha,

    I have NTPD running, so I check, time is ok.
    And I cannot disable the rest of the interfaces.

    I though that I will just try to automatic installation with Ambari, and then I encountered another issue: Ambari server starts, I create an SSH tunnel to the machine, I get the login screen but when I try to log in with admin/admin then I just get the login screen back.

    In the log files of the server I get:

    11:54:43,761 INFO AmbariLocalUserDetailsService:62 - Loading user by name: admin
    11:54:44,691 INFO AmbariLocalUserDetailsService:62 - Loading user by name: nagiosadmin
    11:54:44,693 INFO AmbariLocalUserDetailsService:67 - user not found


    Sasha J

    I think there are some problems with your networking configuration…
    Are you using LDAP or similar?
    Could you reset ambari server , wipe our logs and then start it back again?
    If error still exist, please post the whole ambari-server log.
    Also, it will be very useful if you provide your system configuration details…

    Thank you!

    Tunde Balint

    Hi Sasha,

    I am not using LDAP and I gave up with the ambari server installation, as it was just a test to see if I can get the cluster working properly. My goal is to install the cluster with RPMs.

    I did make some progress. I tried to play around with the interfaces and I noticed that the cluster and the mapreduce jobs work properly if I install them using eth0. If I try to make the hadoop traffic use eth1 or eth2, then I get the error described initially.

    Do you know if there is something that I need to set to force the mapreduce jobs to use a different interface?

    Best regards,

    Tunde Balint

    Hi Sasha,
    It turned out to be a network problem: the namenode and jobtracker had MTU set to 1500 and the datanodes had MTU set to 9000. When we set everything to the same value, the error disappeared.

    Thanks for your help!

    Seth Lyubich

    Hi Tunde,

    Thanks for letting us know that the issue is resolved. We will consider enhancing error logging for such issues.


The forum ‘HDP on Linux – Installation’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.