Get Started


Ready to Get Started?

Download sandbox

How can we help you?

closeClose button

The Hortonworks Blog

Everyday more and more new devices—smartphones, sensors, wearables, tablets, home appliances—connect together by joining the “Internet of Things.” Cisco predicts that by 2020, there will be 50 billion devices connected to Internet of Things. Naturally, they all will emit streams of data, in short intervals. Obviously, these data streams will have to be stored, will […]

This is the third post in a series that explores the theme of supporting rolling-upgrades & downgrades of a Hadoop YARN cluster. See here for an introductory post. Introduction Carrying out a rolling upgrade/downgrade of all nodes in a Hadoop cluster can be a very disruptive process. Before HDP 2.2, if a NodeManager (NM) were […]

Introduction Today, organizations use the Apache Hadoop™ stack in the form of a central data lake to store their critical datasets and power their analytical processing workloads. A key requirement for the Hadoop cluster and the services running on it is to be highly available and flawlessly continue to function while software is being upgraded. […]

This is the second post in a series exploring the theme of long-running service workloads in YARN. See for the introductory post. Long-running services deployed on YARN are by definition expected to run for a long period of time—in many cases forever. Services such as Apache™ HBase, Apache Accumulo and Apache Storm can be run […]

Apache HBase is the online database natively integrated with Hadoop, making HBase the obvious choice for applications that rely on Hadoop’s scale and flexible data processing. With the Hortonworks Data Platform 2.2, HBase High Availability has taken a major step forward, allowing apps on HBase to deliver 99.99% uptime guarantees. This blog takes a look […]

YARN and Apache Storm: A Powerful Combination YARN changed the game for all data access engines in Apache Hadoop. As part of Hadoop 2, YARN took the resource management capabilities that were in MapReduce and packaged them for use by new engines. Now Apache Storm is one of those data-processing engines that can run alongside […]

The Journey Almost to the date, two years ago the Apache Hadoop community voted to make YARN a sub-project of Apache Hadoop followed by the GA release nearly a year ago last fall. Since then, it’s becoming plainly obvious that Apache Hadoop 2.x, powered by YARN as its architectural center, is the best platform for […]

This post’s Principal Author: Ming Ma, Software Development Manager, eBay.  With contribution from Mayank Bansal (eBay), Devaraj Das (Hortonworks), Nicolas Liochon (Scaled Risk), Michael Weng (eBay), Ted Yu (Hortonworks), John Zhao (eBay) eBay runs Apache Hadoop at extreme scale, with tens of petabytes of data. Hadoop was created for computing challenges like ours, and eBay […]

With HDP 1.3 and HDP 2.0 Beta, we introduced the ability to create snapshots to protect important enterprise data sets from user or application errors. HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system and are: Performant and Reliable: […]

The shift to a data-oriented business is happening. The inherent value in established and emerging big datasets is becoming clear. Enterprises are building big data strategies to take advantage of these new opportunities and Hadoop is the platform to realize those strategies. Hadoop is enabling a modern data architecture where it plays a central role: […]

In this blog, I’ll cover how we tested Full Stack HA with NameNode HA in Hadooop 1 with Hadoop and HBase as components of the stack. Yes, NameNode HA is finally available in the Hadoop 1 line. The test was done with Hadoop branch-1 and HBase-0.92.x on a cluster of roughly ten nodes. The aim […]

Introduction A Highly Available NameNode for HDFS has been in development since last year. That effort focused singularly on the automatic failover of the NameNode for Hadoop 2.0. During that time we have realized two things. First, we realized we should use an outside-in approach to the HA problem: start by designing the availability of […]

In Shaun Connolly’s post about balancing community innovation and enterprise stability, he discussed the importance of an open source project forging ahead with big improvements that are expected to be initially buggy and incomplete functionally but then stabilize over time. In the case of Apache Hadoop 2.0, currently in community Alpha release, the big improvements […]

We reached a significant milestone in HDFS: the Namenode HA branch was merged into the trunk. With this merge, HDFS trunk now supports HOT failover. Significant enhancements were completed to make HOT Failover work: Configuration changes for HA Notion of active and standby states were added to the Namenode Client-side redirection Standby processing journal from […]