Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
October 19, 2017 | Shelby Khan | Dataworks Summit

7 Sessions From DataWorks Summit Sydney You Should See

October 18, 2017 | Kevin Jordan | Hortonworks Case Study

How Much Can You Trust Your Big Data?

October 16, 2017 | Matt Spillar | Hortonworks Case Study

Leveraging Data to Make Decisions in Financial Services

Viewing posts by: Sanjay Radia« Back to all

X
FILTERS
ALL
TECHNICAL
BUSINESS

All Topics















All Channels











CLEAR FILTERS

Ram Venkatesh also contributed to this blog series  Why Apache Hadoop in the Cloud? Ten years ago, Hadoop, the elephant started the Big Data journey inside the firewall of a data center- the Apache Hadoop components were deployed on commodity servers inside a private data center. Now, the public cloud is another viable option for […]

Introduction Today, organizations use the Apache Hadoop™ stack in the form of a central data lake to store their critical datasets and power their analytical processing workloads. A key requirement for the Hadoop cluster and the services running on it is to be highly available and flawlessly continue to function while software is being upgraded. […]

Traditionally, HDFS, Hadoop’s storage subsystem, has focused on one kind of storage medium, namely spindle-based disks.  However, a Hadoop cluster can contain significant amounts of memory and with the continued drop in memory prices, customers are willing to add memory targeted at caching storage to speed up processing. Recently HDFS generalized its architecture to include […]

Introduction A Highly Available NameNode for HDFS has been in development since last year. That effort focused singularly on the automatic failover of the NameNode for Hadoop 2.0. During that time we have realized two things. First, we realized we should use an outside-in approach to the HA problem: start by designing the availability of […]

Data integrity and availability are important for Apache Hadoop, especially for enterprises that use Apache Hadoop to store critical data.  This blog will focus on a few important questions about Apache Hadoop’s track record for data integrity and availability and provide a glimpse into what is coming in terms of automatic failover for HDFS NameNode. […]