I encountered a very basic problem that is so common and so basic,i.e I just can not recover a failed hadoop Datanode gracefully without stopping all service on all nodes then starting all service on all nodes.
Originally,I have hadoop cluster with 5 nodes(1 NameNodes:host001 +4 DataNodes host002,host003,host004,host005),When I shut off a DataNode(host005),HMC can found host005 DataNode down,then issued a warning blinking word on the HMC monitor.However, when I powered on this host005 DataNode again, host005 can not recover its hadoop services, like datanode, and tasktracker, so I still have ony 4 hadoop nodes workable.
My method to recover host005 datanode’s service is “stop all services ,then start all service on HMC”, but it’s so stupid, and not practical in real world. so I want to know if anybody can suggest a better way for me to follow.
What should I do if I want to recover a failed DataNode without stopping all service then starting all services.
What’s the correct procedure ?