Home Forums HDFS HDFS services is down

This topic contains 5 replies, has 3 voices, and was last updated by  Robert Molina 8 months, 3 weeks ago.

  • Creator
    Topic
  • #43445

    Anupam Gupta
    Participant

    Hi All,

    I have installed HDP 1.3 (Ambari Installation) on EC2 with 2 node(centos 6.3) cluster my instance type is m1.small (1.7 GB RAM). I successfully Installed HDP and all services was running fine, but after some time only HDFS is down (Name Node stopped ).
    I tried to start HDFS from ambari Web UI but failed to start.

    stdout:
    notice: /Stage[2]/Hdp-hadoop::Namenode/Hdp-hadoop::Namenode::Create_app_directories[create_app_directories]/Hdp-hadoop::Hdfs::Directory[/mapred]/Hdp-hadoop::Exec-hadoop[fs -chown mapred /mapred]/Hdp::Exec[hadoop --config /etc/hadoop/conf fs -chown mapred /mapred]/Anchor[hdp::exec::hadoop --config /etc/hadoop/conf fs -chown mapred /mapred::begin]: Dependency Exec[sleep 5; ls /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid >/dev/null 2>&1 && ps `cat /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid` >/dev/null 2>&1] has failures: true
    notice: /Stage[2]/Hdp-hadoop::Namenode/Hdp-hadoop::Namenode::Create_app_directories[create_app_directories]/Hdp-hadoop::Hdfs::Directory[/mapred]/Hdp-hadoop::Exec-hadoop[fs -chown mapred /mapred]/Hdp::Exec[hadoop --config /etc/hadoop/conf fs -chown mapred /mapred]/Exec[hadoop --config /etc/hadoop/conf fs -chown mapred /mapred]: Dependency Exec[sleep 5; ls /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid >/dev/null 2>&1 && ps `cat /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid` >/dev/null 2>&1] has failures: true
    notice: /Stage[2]/Hdp-hadoop::Namenode/Hdp-hadoop::Namenode::Create_app_directories[create_app_directories]/Hdp-hadoop::Hdfs::Directory[/mapred]/Hdp-hadoop::Exec-hadoop[fs -chown mapred /mapred]/Hdp::Exec[hadoop --config /etc/hadoop/conf fs -chown mapred /mapred]/Anchor[hdp::exec::hadoop --config /etc/hadoop/conf fs -chown mapred /mapred::end]: Dependency Exec[sleep 5; ls /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid >/dev/null 2>&1 && ps `cat /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid` >/dev/null 2>&1] has failures: true
    notice: /Stage[2]/Hdp-hadoop::Namenode/Hdp-hadoop::Namenode::Create_app_directories[create_app_directories]/Hdp-hadoop::Hdfs::Directory[/mapred]/Hdp::Exec[echo '/mapred' >> /var/log/hadoop/hdfs/namenode_dirs_created]/Anchor[hdp::exec::echo '/mapred' >> /var/log/hadoop/hdfs/namenode_dirs_created::begin]: Dependency Exec[sleep 5; ls /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid >/dev/null 2>&1 && ps `cat /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid` >/dev/null 2>&1] has failures: true
    notice: /Stage[2]/Hdp-hadoop::Namenode/Hdp-hadoop::Namenode::Create_app_directories[create_app_directories]/Hdp-hadoop::Hdfs::Directory[/mapred]/Hdp::Exec[echo '/mapred' >> /var/log/hadoop/hdfs/namenode_dirs_created]/Exec[echo '/mapred' >> /var/log/hadoop/hdfs/namenode_dirs_created]: Dependency Exec[sleep 5; ls /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid >/dev/null 2>&1 && ps `cat /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid` >/dev/null 2>&1] has failures: true

    Kindly Help
    Thanks In Advance,
    Sandy

Viewing 5 replies - 1 through 5 (of 5 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #45292

    Robert Molina
    Moderator

    HI Sandy,
    In the logs you provided, there isn’t any information to show that namenode process is shutting down. In the your recent namenode log, if you scroll to the bottom of the file, you should see some type of shutdown message if the namenode process shut itself down. Do you see any of those type of messages? If so, please post that.

    Regards,
    Robert

    Collapse
    #43469

    Anupam Gupta
    Participant

    hi Dave
    I started the namenode manually by using the command given by you.but namenode is not starting and pid is generating new after deleting the old pid . I am specifying the current namenode log below. kindly guide me . what is the issue now and how can I resolve this.
    2013-11-11 14:46:11,706 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addStoredBlock: blockMap updated: 10.165.16.32:50010 is added to blk_-8755223638506669119_1375 size 73
    2013-11-11 14:46:11,708 INFO org.apache.hadoop.hdfs.StateChange: Removing lease on /user/ambari-qa/.staging/job_201311111431_0003/job.splitmetainfo from client DFSClient_NONMAPREDUCE_-55172484_33
    2013-11-11 14:46:11,708 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /user/ambari-qa/.staging/job_201311111431_0003/job.splitmetainfo is closed by DFSClient_NONMAPREDUCE_-55172484_33
    2013-11-11 14:46:12,006 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/ambari-qa/.staging/job_201311111431_0003/job.xml. blk_-3947213946175548025_1376
    2013-11-11 14:46:12,066 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addStoredBlock: blockMap updated: 10.145.183.250:50010 is added to blk_-3947213946175548025_1376 size 40141
    2013-11-11 14:46:12,110 INFO org.apache.hadoop.hdfs.StateChange: Removing lease on /user/ambari-qa/.staging/job_201311111431_0003/job.xml from client DFSClient_NONMAPREDUCE_-55172484_33
    2013-11-11 14:46:12,110 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /user/ambari-qa/.staging/job_201311111431_0003/job.xml is closed by DFSClient_NONMAPREDUCE_-55172484_33
    2013-11-11 14:46:12,249 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /mapred/system/job_201311111431_0003/job-info. blk_1714712068037228924_1377
    2013-11-11 14:46:12,264 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addStoredBlock: blockMap updated: 10.165.16.32:50010 is added to blk_1714712068037228924_1377 size 119
    2013-11-11 14:46:12,267 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addStoredBlock: blockMap updated: 10.145.183.250:50010 is added to blk_1714712068037228924_1377 size 119
    2013-11-11 14:46:12,268 INFO org.apache.hadoop.hdfs.StateChange: Removing lease on /mapred/system/job_201311111431_0003/job-info from client DFSClient_NONMAPREDUCE_727777343_1
    2013-11-11 14:46:12,269 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /mapred/system/job_201311111431_0003/job-info is closed by DFSClient_NONMAPREDUCE_727777343_1
    2013-11-11 14:46:12,320 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /mapred/system/job_201311111431_0003/jobToken. blk_5244767158642097304_1378
    2013-11-11 14:46:12,353 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addStoredBlock: blockMap updated: 10.165.16.32:50010 is added to blk_5244767158642097304_1378 size 239
    2013-11-11 14:46:12,354 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addStoredBlock: blockMap updated: 10.145.183.250:50010 is added to blk_5244767158642097304_1378 size 239
    2013-11-11 14:46:12,359 INFO org.apache.

    Collapse
    #43450

    Robert Molina
    Moderator

    Hi Sandy,
    It looks like for some reason. it is not able to get a pid. You can try launching the the namenode manually to verify if you are still having an issue.

    here is the command to do so, run it on the namenode server:

    su – hdfs -c ‘/usr/lib/hadoop/bin/hadoop-daemon.sh start namenode’

    Tail the namenode logs and see what messages you get.

    Also verify the pid /var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid. See if the pid file is even being generated, by deleting the old pid.

    Hope that helps.

    Regards
    Robert

    Collapse
    #43449

    Anupam Gupta
    Participant

    Hi Dave ,

    I stopped all services and tried to start all services , after making 2-3 attempt not able to start HDFS .
    Some of Name Node log are following …

    2013-11-11 14:42:44,825 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /apps/hbase/data/.logs/ip-10-165-16-32.ec2.internal,60020,1384180397632/ip-10-165-16-32.ec2.internal%2C60020%2C1384180397632.1384180964492 is closed by DFSClient_hb_rs_ip-10-165-16-32.ec2.internal,60020,1384180397632_-1911689281_26
    2013-11-11 14:42:45,488 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /apps/hbase/data/.logs/ip-10-165-16-32.ec2.internal,60020,1384180397632/ip-10-165-16-32.ec2.internal%2C60020%2C1384180397632.1384180964784. blk_-3829807811695468028_1276
    2013-11-11 14:42:45,499 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* fsync: /apps/hbase/data/.logs/ip-10-165-16-32.ec2.internal,60020,1384180397632/ip-10-165-16-32.ec2.internal%2C60020%2C1384180397632.1384180964784 for DFSClient_hb_rs_ip-10-165-16-32.ec2.internal,60020,1384180397632_-1911689281_26
    2013-11-11 14:42:46,063 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 10.145.183.250:50010 to delete blk_-3640192982575671302_1115 blk_-2718466493571609272_1116
    2013-11-11 14:43:12,859 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* addToInvalidates: blk_-2146298840139643704 to 10.145.183.250:50010 10.165.16.32:50010
    2013-11-11 14:43:13,478 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 10.165.16.32:50010 to delete blk_-2146298840139643704_1249
    2013-11-11 14:43:16,487 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 10.145.183.250:50010 to delete blk_-2146298840139643704_1249
    2013-11-11 14:44:12,677 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 180 Total time for transactions(ms): 7 Number of transactions batched in Syncs: 5 Number of syncs: 114 SyncTimes(ms): 450
    2013-11-11 14:44:37,493 ERROR org.mortbay.log: /jmx
    javax.management.RuntimeMBeanException: getMBeanInfo threw RuntimeException
    at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBeanInfo(DefaultMBeanServerInterceptor.java:1381)
    at com.sun.jmx.mbeanserver.JmxMBeanServer.getMBeanInfo(JmxMBeanServer.java:880)
    at org.apache.hadoop.jmx.JMXJsonServlet.listBeans(JMXJsonServlet.java:184)
    at org.apache.hadoop.jmx.JMXJsonServlet.doGet(JMXJsonServlet.java:160)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)

    Thanks,
    Sandy

    Collapse
    #43446

    Dave
    Moderator

    Hi Sandy,

    Can you attach some of the NameNode log here?
    /var/log/hadoop/hdfs

    Thanks

    Dave

    Collapse
Viewing 5 replies - 1 through 5 (of 5 total)