Failure of Active Namenode in a non-HA deployment
The best approach to mitigating the risk of data loss due to a NameNode failure is to harden the NameNode system and components to meet the desired level of redundancy.
Since the journal is not flushed with every operation, it could be up to several seconds out of sync with the persisted disk state. This latency determines the scope of potential data loss, in the event of NameNode failure.
Having a highly fault tolerant NameNode system, mitigates the potential for data loss. In the future, when the NameNode is distributed, this latency will no longer be a concern and data loss scenarios become much less probable.
This level of fault tolerance and availability can be reached through various mechanism either hardware, software, or some combination.
Until NameNode HA (High Availability) becomes available, the current solution is to set up a secondary name node that will store a duplicate set of data.…Tags: NameNode Read More »