Home Forums HDP on Linux – Installation HDFS down and not coming back

This topic contains 16 replies, has 5 voices, and was last updated by  Dave 1 year, 3 months ago.

  • Creator
    Topic
  • #30842

    Ardavan Moinzadeh
    Participant

    Hello,
    I upload my error log regarding to HDFS failure named as Snamenode failed.txt, can you tell me what caused the failure?

    I have a cluster of three nodes. A, B , C with the following architecture:
    A: Namenode/ Nagios/ Ganglia collector/ Hiveserver2,/Hive Metastore /,WebhCat /, Hbase master/, oozie server/ Zookeeper
    B:Snamenode /Jobtracker zookeeper
    C:Zookeeper
    what is the solution to bring HDFS up again?

Viewing 16 replies - 1 through 16 (of 16 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #32966

    Dave
    Moderator

    Hi Ardavan,

    Thanks for letting us know that this issue is now resolved.

    Dave

    Collapse
    #32828

    Ardavan Moinzadeh
    Participant

    ISSUE RESOLVED!..
    I had incompatible namespaceID issue. there are two ways you can fix this issue:
    A:delete the data directory and reformat the HDFS and start your service which is not recommended if your cluster is in production.

    B:. You can edit the value of NameSpaceID in /current/version to match the value of current NameNode and then restart the service.

    Thank you

    Collapse
    #31212

    Seth Lyubich
    Keymaster

    Hi Ardavan,

    You also can check if any PID files are owned by root, since you tried to start processes as user root. If you find any of such processes you can try to remove PID file(s) and try to restart the process.

    Thanks,
    Seth

    Collapse
    #31173

    Sasha J
    Moderator

    The best way to handle this situation is wipe out everything and start from the scratch.
    Something seems to be seriously damaged in there….

    Thank you!
    Sasha

    Collapse
    #31115

    Ardavan Moinzadeh
    Participant

    and can’t even start or stop services anymore! tried restarting ambari-server and ambari-agent didn’t help

    Collapse
    #31113

    Ardavan Moinzadeh
    Participant

    Suddenly Nagios is not showing the alrets on the right tab on Ambari..acceesing it through web UI I see the follwing error: Error: Could not read object configuration data!

    Has this anything to do with my issue with HDFS?

    Collapse
    #31074

    Ardavan Moinzadeh
    Participant

    Hello Seth,
    yes, at first I did try starting the process as root, however later on I continued with hdfs user . some of the permissions where changed so I matched them with other working clusters.

    on Node B which SNN is installed under /data/b …. /data/i a directory called namenode is shown which based on the initial configuration this folder should not be here so I deleted it. After formating the namenode and:
    a: starting the namenode
    b:starting all datanode
    c: starting the secondary namenode
    it seems like non of my 3 datanodes or coming up — same for SNN.

    this is what I have so far!…

    What do you suggest?
    Thank you

    Collapse
    #31053

    Seth Lyubich
    Keymaster

    Hi Ardavan ,

    Can you please let us know at which point you are getting the error and which user you are using to start the process? Also, looking at your log file:


    Recovering storage directory /data/b/hadoop/hdfs/namesecondary from failed checkpoint

    Access denied for user hdfs. Superuser privilege is required

    Did you change any permission or tried to start process as user root? You might need to check permission of /data/b/hadoop/hdfs/namesecondary directory.

    Hope this helps,

    Thanks,
    Seth

    Collapse
    #31047

    Ardavan Moinzadeh
    Participant

    Robert,

    I was able to resolve that issue. Now that I am trying to bring up SNN it fails on me! For some reason I can’t login into your FTP to upload my log files ==>FTP Listing of Root at http://ftp.support.hortonworks.com

    This is a part of my log
    2013-08-06 23:58:18,240 INFO org.apache.hadoop.hdfs.server.common.Storage: Recovering storage directory /data/b/hadoop/hdfs/namesecondary from failed checkpoint
    2013-08-06 23:58:18,252 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint:
    2013-08-06 23:58:18,253 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.security.AccessControlException: Access denied for user hdfs. Superuser privilege is required
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkSuperuserPrivilege(FSPermissionChecker.java:93)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkSuperuserPrivilege(FSNamesystem.java:5927)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:5824)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.rollEditLog(NameNode.java:1022)
    at sun.reflect.GeneratedMethodAccessor122.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:587)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1444)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1440)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1438)

    at org.apache.hadoop.ipc.Client.call(Client.java:1118)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
    at $Proxy5.rollEditLog(Unknown Source)
    at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:512)
    at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:396)
    at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:360)
    at java.lang.Thread.run(Thread.java:662)

    Thank you

    Collapse
    #31045

    Robert
    Participant

    Hi Ardavan,
    What are the permissions for the list of files located in /data/b/hadoop/hdfs/namenode/ ? Ideally, they should be own by hdfs:hadoop

    Regards,
    Robert

    Collapse
    #31001

    Ardavan Moinzadeh
    Participant

    etc/host on all nodes is identical:
    (( Private IP address node1
    private IP address node 2
    Private IP address node 3
    127.0.0.1 localhost
    ))

    my attempt to upload the log file in ftp://ftp.hortonworks.com/ was not succesfull.
    This is the error I see in Namenode log file:
    :java.io.FileNotFoundException: /data/b/hadoop/hdfs/namenode/in_use.lock (Permission denied)
    is this SSH issue?

    Collapse
    #30963

    Sasha J
    Moderator

    Is NameNode process running?
    Are DataNode processes running?
    What is your /etc/hosts files containing?
    It seems to me like you use “localhost” as name of nodes on all of them, which is incorrect.

    What is NameNode LOG file says? Not .out, but .log

    Thank you!
    Sasha

    Collapse
    #30954

    Ardavan Moinzadeh
    Participant

    why I can’t start the SNN?
    it says it cannot assign requested address! ..what does it mean? I am able to SSH betwenn all 3 nodes! all /etc/host directory are correct!

    logging to /var/log/hadoop/root/hadoop-root-secondarynameno de-bddec1v6-0011.out
    localhost: Exception in thread “main” java.net.BindException: Cannot assign requested address
    localhost: at sun.nio.ch.Net.bind(Native Method)
    localhost: at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
    localhost: at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
    localhost: at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:2 16)
    localhost: at org.apache.hadoop.http.HttpServer.start(HttpServer.java:602)
    localhost: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initializeHttpWebServe r(SecondaryNameNode.java:278)
    localhost: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNa meNode.java:218)
    localhost: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.(SecondaryNameNo de.java:150)
    localhost: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode .java:676)

    Collapse
    #30939

    Sasha J
    Moderator

    This means you have something misconfigured somehow…
    Or some permission issues.

    Take a look on the NN, DN and SNN logs.

    Sasha

    Collapse
    #30914

    Ardavan Moinzadeh
    Participant

    Before submitting this post I did try to start HDFS both from Ambari and from the boxes by running ./start-dfs.sh
    it’s not coming up!
    I have 3 alerts on my hosts:
    Jobtracker and SNamenode is down on B and not coming up
    Ironically A & C are green but still HDFS is down.

    Collapse
    #30911

    Sasha J
    Moderator

    Just start HDFS again.
    It should start normally from the second try.

    Thank you!
    Sasha

    Collapse
Viewing 16 replies - 1 through 16 (of 16 total)