Home Forums HDP on Linux – Installation Failed to start HDFS service after installation

This topic contains 8 replies, has 2 voices, and was last updated by  Dave 9 months, 3 weeks ago.

  • Creator
    Topic
  • #38200

    Tanzir
    Participant

    Little background: I have been using HDP 1.2 for a long time and haven’t faced this issue so far.

    Yesterday, I have installed a new HDP 1.3.2 cluster through Ambari (Amazon EC2). After installation, all services and smoke tests were running successfully. Then I stopped all the services by clicking on the new button “Stop All” and then I stopped all ambari agents and finally server.

    This morning, I started my instances(used for HDP 1.3.2) again and started ambari server. I tried to start services one by one this time(I did try with “Start All” button but faced same issue). I tried to start HDFS service at the beginning. It’s failing.

    Under HDFS, all data nodes started successfully. But when it tries to start client, namenode and seoncdary name node, its getting halted at the middle.

    10.0.0.149 (client machine) – It stops after 50%
    10.0.0.75 (name node) – It stops after 35%
    10.0.0.76 (job tracker, secondary name node) – It stops at 0%

    I have been using VPC in Amazon EC2 so their private IPs remain unchanged after restart/reboot. I have exactly same configuration setup with HDP 1.2 and I didn’t face this issue earlier.

    After waiting several minutes, the Namenode start task shows the following error (stderr):
    ———————-
    none

    Puppet has been killed due to timeout
    ———————-

    Any information will be highly appreciated. Thanks in advance.

Viewing 8 replies - 1 through 8 (of 8 total)

The topic ‘Failed to start HDFS service after installation’ is closed to new replies.

  • Author
    Replies
  • #38637

    Dave
    Moderator

    Hi Dave,
    Thanks a lot for your help. I have found the issue. When I created those instances for HDP, I forgot to remove ephemeral storage from the instance. So what happened then that during installation the Oozie data directory was pointed to /mnt/hadoop/oozie/data by default. I thought unless I mention during installation, it will not use /mnt point for oozie or other services.

    As a result, after I stop/start the instance all data in that mount point (ephemeral storage) got lost and hence Oozie didn’t find the schema. This is also the reason behind the issue with the namenode formatting (my other thread related to HDFS).

    To be sure about that, I just installed HDP 1.3.2 again in another cluster and this time I removed ephemeral storage from the instances. So, this time Oozie data path is pointed to /hadoop/oozie/data. Now everything seems working and even “Start All” and “Stop All” buttons are working as expected. Namenode issue also resolved now.

    Thanks again,

    - Tanzir

    Collapse
    #38404

    Tanzir
    Participant

    Please ignore my last post as I have found the issue and its not related to those buttons (details info: http://hortonworks.com/community/forums/topic/oozie-smoke-test-is-getting-failed/).

    Collapse
    #38351

    Tanzir
    Participant

    Looks like I can reproduce this issue.

    -> All services are down and you want to start your cluster
    -> Clicking on the new button “Start All” corrupts the namenode. After 5-6 minutes it failed and in namenode log I see this:

    [root@ip-10-0-0-75 hdfs]# tail -n 20 hadoop-hdfs-namenode-ip-10-0-0-75.log
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:466)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:432)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:302)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:585)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1523)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1532)
    2013-09-27 16:36:41,021 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException: NameNode is not formatted.
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:316)

    —> But if I do not start the service through “Start All” button and if I start the service one by one then I do not see this issue.

    Is it a know issue? I thought “Start All” button maintains the order of the services internally.

    Collapse
    #38233

    Tanzir
    Participant

    Thanks a lot Dave, after applying that command I was able to start HDFS service. Thanks again.

    Collapse
    #38207

    Dave
    Moderator

    Hi Tanzir,

    Sure no problem, you need to run:

    su -l hdfs -c “hadoop namenode -format”

    Select Yes and then the namenode will start and all the other services should be good to connect in.

    Thanks

    Dave

    Collapse
    #38206

    Tanzir
    Participant

    Looks like I need to execute the following line:

    su – hdfs -c “/usr/lib/hadoop/bin/hadoop namenode -format”

    Please correct me if I’m wrong. But my question is it it common? I never faced this issue earlier.

    Collapse
    #38204

    Tanzir
    Participant

    Hi Dave,
    Thanks a lot for your quick response. I just ran the following command:

    [root@ip-10-0-0-75 hdfs]# tail -n 50 hadoop-hdfs-namenode-ip-10-0-0-75.log

    And I got this:

    2013-09-26 18:23:37,360 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStateMBean and NameNodeMXBean
    2013-09-26 18:23:37,408 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
    2013-09-26 18:23:37,408 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times
    2013-09-26 18:23:37,410 INFO org.apache.hadoop.hdfs.util.GSet: Computing capacity for map INodeMap
    2013-09-26 18:23:37,410 INFO org.apache.hadoop.hdfs.util.GSet: VM type = 64-bit
    2013-09-26 18:23:37,410 INFO org.apache.hadoop.hdfs.util.GSet: 1.0% max memory = 1052770304
    2013-09-26 18:23:37,410 INFO org.apache.hadoop.hdfs.util.GSet: capacity = 2^20 = 1048576 entries
    2013-09-26 18:23:37,410 INFO org.apache.hadoop.hdfs.util.GSet: recommended=1048576, actual=1048576
    2013-09-26 18:23:37,466 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
    java.io.IOException: NameNode is not formatted.
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:316)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:144)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:466)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:432)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:302)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:585)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1523)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1532)
    2013-09-26 18:23:37,475 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException: NameNode is not formatted.
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:316)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:144)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:466)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:432)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:302)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:585)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1523)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1532)

    2013-09-26 18:23:37,477 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at ip-10-

    Collapse
    #38201

    Dave
    Moderator

    Hi Tanzir,

    You will need to look in the NameNode logs, /var/log/hadoop/hdfs

    This should give you some indication as to why the NameNode cannot start.

    Thanks

    Dave

    Collapse
Viewing 8 replies - 1 through 8 (of 8 total)