The Hortonworks Blog

More from Sanjay Radia

Introduction

A Highly Available NameNode for HDFS has been in development since last year. That effort focused singularly on the automatic failover of the NameNode for Hadoop 2.0. During that time we have realized two things.

First, we realized we should use an outside-in approach to the HA problem: start by designing the availability of the Hadoop system as a whole and then focus on the high-availability of individual components; that work lead to the Full Stack HA Architecture.…

Data integrity and availability are important for Apache Hadoop, especially for enterprises that use Apache Hadoop to store critical data.  This blog will focus on a few important questions about Apache Hadoop’s track record for data integrity and availability and provide a glimpse into what is coming in terms of automatic failover for HDFS NameNode.

What is Apache Hadoop’s Track Record for Data Integrity?

In 2009, we examined HDFS’s data integrity at Yahoo!…

Thank you for subscribing!