Home Forums HDFS Additional Data Node Bring up

This topic contains 8 replies, has 2 voices, and was last updated by  Seth Lyubich 2 years ago.

  • Creator
    Topic
  • #11697

    I have successfully gotten HMC to run on a single node installation.
    Subsequently, I was able to add a data node to this cluster.

    — Once the data node has been added, I am not seeing that node on the cluster.. i.e. Cluster Summary still shows only details for one node (the original single node).

    — The Monitoring page shows 2 nodes up/down (as UP).. but the Cluster Summary shows only 1 Data Node. Cutting and pasting here:
    Cluster Summary
    HDFS
    NameNode Uptime 0day 3hr 30min
    HDFS Capacity 768.98 MB / 436.81 GB
    DataNodes (live/dead/decom) 1 / 0 / 0
    Under Replicated Block Count 682

    The MapReduce shows only 1 Trackers

    MapReduce
    Job Tracker Uptime 0day 3hr 28min
    Trackers 1 / 0
    Running & Waiting Jobs 0 & 0

    On the Services segment:
    MapReduce and HDFS have a flashing 1 in red.

    In Nagios, under Problems:
    I have the following 2 issues on the new node:
    Service State Information
    Current Status:
    CRITICAL
    (for 3d 4h 0m 40s)
    Status Information: Connection refused
    Performance Data:
    Current Attempt: 3/3 (HARD state)
    Last Check Time: 10-29-2012 17:53:46
    Check Type: ACTIVE
    Check Latency / Duration: 0.252 / 0.005 seconds
    Next Scheduled Check: 10-29-2012 17:54:46
    Last State Change: 10-26-2012 13:53:51
    Last Notification: 10-26-2012 13:55:00 (notification 1)

    Service State Information
    Current Status:
    CRITICAL
    (for 3d 4h 0m 52s)
    Status Information: Connection refused
    Performance Data:
    Current Attempt: 3/3 (HARD state)
    Last Check Time: 10-29-2012 17:54:05
    Check Type: ACTIVE
    Check Latency / Duration: 0.139 / 0.006 seconds
    Next Scheduled Check: 10-29-2012 17:55:05
    Last State Change: 10-26-2012 13:54:10
    Last Notification: 10-26-2012 13:55:20 (notification 1)
    Is This Service Flapping?
    NO
    (0.00% state change)
    In Scheduled Downtime?

    JPS on the second node shows DataNode and TaskTracker running:
    9853 HRegionServer
    12250 DataNode
    13380 Jps
    9086 TaskTracker

    What am I missing ?

Viewing 8 replies - 1 through 8 (of 8 total)

The topic ‘Additional Data Node Bring up’ is closed to new replies.

  • Author
    Replies
  • #11821

    Seth Lyubich
    Keymaster

    Hi Anand,

    Thanks for follow up and your feedback. I will check on our side regarding your documentation suggestion.

    Thanks,
    Seth

    Collapse
    #11770

    Seth,
    Using #service hmc-agent start on the added node helped.

    That resolved the issue for me.
    May I recommend that this instruction be added to the HDP installation notes?
    It has proved helpful to me in the last two issues…

    Thank you for your continued and prompt responses.

    Collapse
    #11711

    Seth Lyubich
    Keymaster

    requirement for password-less ssh is:

    You must set up password-less SSH connections between the main installation host and all other machines.

    This means that you should be OK the way you have set it up.

    You can also try to see if your hmc-agent service is running on new host. On new host run:

    #service hmc-agent status

    Hope this helps

    Seth

    Collapse
    #11710

    Question:
    When you say passwordless SSH, it is set up such that the “Master”, the desktop can access the datanode (the laptop). Should it also work the other way around ?
    i.e. do I need the desktop’s public key stored in the laptop ?

    Collapse
    #11709

    Seth,
    Thank you..

    jps command output from the laptop:
    [root@mpc5lp2 ~]# /usr/jdk64/jdk1.6.0_31/bin/jps
    29556 TaskTracker
    28626 DataNode
    30445 HRegionServer
    31935 Jps

    Tried the passwordless ssh and that works too..

    Collapse
    #11708

    Seth Lyubich
    Keymaster

    Hi Anand,

    Thanks for providing additional details.

    Can you please verify:

    1. that tasktracker and datanode processes are running on the new node. Please provide output from jps command.
    2. Verify that passwordless ssh is working between nodes.

    If you still have issues we can try WebEx.

    Thanks,
    Seth

    Collapse
    #11707

    Seth,
    Status from original single node:
    [root@mpc5cp1 ~]# sestatus
    SELinux status: disabled
    [root@mpc5cp1 ~]# service iptables status
    Firewall is stopped.

    Status from added datanode:
    [root@mpc5lp2 ~]# sestatus
    SELinux status: disabled
    [root@mpc5lp2 ~]# service iptables status
    Firewall is stopped.

    In Nagios:
    Service
    DATANODE::Process down
    On Host
    mpc5lp2

    (mpc5lp2)
    Member of
    HDFS

    Service State Information
    Current Status: CRITICAL (for 4d 1h 3m 21s)
    Status Information: Connection refused
    Performance Data:
    Current Attempt: 3/3 (HARD state)
    Last Check Time: 10-30-2012 14:56:32
    Check Type: ACTIVE
    Check Latency / Duration: 0.168 / 0.005 seconds
    Next Scheduled Check: 10-30-2012 14:57:32
    Last State Change: 10-26-2012 13:53:51
    Last Notification: 10-26-2012 13:55:00 (notification 1)
    Is This Service Flapping? NO (0.00% state change)
    In Scheduled Downtime? NO
    Last Update: 10-30-2012 14:57:12 ( 0d 0h 0m 0s ago)

    Service
    TASKTRACKER::Process down
    On Host
    mpc5lp2
    (mpc5lp2)

    Member of
    MAPREDUCE

    Service State Information
    Current Status:
    CRITICAL
    (for 4d 1h 5m 14s)
    Status Information: Connection refused
    Performance Data:
    Current Attempt: 3/3 (HARD state)
    Last Check Time: 10-30-2012 14:58:51
    Check Type: ACTIVE
    Check Latency / Duration: 0.167 / 0.005 seconds
    Next Scheduled Check: 10-30-2012 14:59:51
    Last State Change: 10-26-2012 13:54:10
    Last Notification: 10-26-2012 13:55:20 (notification 1)
    Is This Service Flapping? NO (0.00% state change)
    In Scheduled Downtime? NO
    Last Update: 10-30-2012 14:59:22 ( 0d 0h 0m 2s ago)

    I did not follow the question: Where the second note is located..
    It is a laptop and on the same network as the desktop. It is physically located next to the desktop.

    Info from NameNode:
    NameNode ‘mpc5cp1:8020′
    Started: Tue Oct 30 13:01:34 PDT 2012
    Version: 1.0.3.16, r
    Compiled: Mon Oct 1 01:33:46 PDT 2012 by jenkins
    Upgrades: There are no upgrades in progress.

    Browse the filesystem
    Namenode Logs
    Go back to DFS home
    Live Datanodes : 1

    Node mpc5cp1
    LastContact 2
    Admin State In Service
    ConfiguredCapacity (GB) 436.81
    Used(GB) 0.77
    Non DFS Used (GB) 27.75
    Remaining(GB) 408.29

    Info from JobTracker:

    mpc5cp1 Hadoop Map/Reduce Administration
    Quick Links
    State: RUNNING
    Started: Tue Oct 30 13:03:29 PDT 2012
    Version: 1.0.3.16, r
    Compiled: Mon Oct 1 01:33:46 PDT 2012 by jenkins
    Identifier: 201210301303
    SafeMode: OFF
    Cluster Summary (Heap Size is 185.19 MB/740 MB)
    Running Map Tasks 0
    Running Reduce Tasks 0
    Total Submissions 3
    Nodes 1
    Occupied Map Slots 0
    Occupied Reduce Slots 0
    Reserved Map Slots 0
    Reserved Reduce Slots 0
    Map Task Capacity 2
    Reduce Task Capacity 2
    Avg. Tasks/Node 4.00
    Blacklisted Nodes 0
    Graylisted Nodes 0
    Excluded Nodes 0

    Clicking on the Nodes:
    Active Task Trackers
    Task Trackers
    Name tracker_mpc5cp1:mpc5cp1/127.0.0.1:33494
    Host mpc5cp1

    Collapse
    #11705

    Seth Lyubich
    Keymaster

    Hi Anand,

    Can you please check to make sure that your iptables and selinux are off on all nodes? Please run:

    #sestatus
    #service iptables status

    If you still have problems, can you please:

    let us know which alerts are failing in Nagios?
    Where second node is located?
    Go to Namenode and Jobtracker UI and let us know how many live nodes you see?

    Thanks,
    Seth

    Collapse
Viewing 8 replies - 1 through 8 (of 8 total)