Get Started


Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
May 24, 2017 | Tom Hastain | Hortonworks Case Study

CenterPoint Energy: Business Value from Large, Complex Data

May 23, 2017 | Tom Hastain | Hortonworks Case Study

Clearsense: Maximum Healthcare Transformation, Minimal Investment

May 23, 2017 | Brian Hagan | Hadoop Insights

Deliver realtime toll and traffic analytics

Viewing posts by: Owen O'Malley« Back to all


All Topics

All Channels


It was 10 years ago today (Feb 2) that my first patch (https://issues.apache.org/jira/browse/NUTCH-197) went into the code that two days later became Hadoop (https://issues.apache.org/jira/browse/HADOOP-1). I had been working on Yahoo Search’s WebMap, which was the back end that analyzed the web for the search engine.  We had been working on a C++ implementation of GFS and MapReduce, but […]

With YARN and HDFS at the architectural center, Hadoop has emerged as a key component of any modern data architecture. Today, enterprises utilize Hadoop to store critical datasets and power many of their critical workloads. With this in mind, the services and data within a Hadoop cluster needed to be highly available in face of failures […]

Two weeks ago, Apache ORC became an Apache top-level project within the Apache Software Foundation (ASF). This step represents a major step forward for the project, and it is representative of its momentum been built by a broad community of developers. What is ORC and why is it useful? Back in January 2013, we created […]

I’ve been working on MapReduce frameworks since mid 2005 (Hadoop’s since the start of 2006) and a fundamental feature has always been incredible throughput to access data, but no ACID transactions. That is changing. Recently, while working with a customer that is using Apache Hive to process terabytes (and growing quickly) of sales data, they […]

As the original architect of MapReduce, I’ve been fortunate to see Apache Hadoop and its ecosystem projects grow by leaps and bounds over the past seven years. Today, most of my time is spent as an architect and committer on Apache Hive. Hive is the gateway for doing advanced work on Hadoop Distributed File System […]

In February, we announced the Stinger Initiative, which outlined an approach to bring interactive SQL-query into Hadoop.  Simply put, our choice was to double down on Hive to extend it so that it could address human-time use cases (i.e. queries in the 5-30 second range). So, with input and participation from the broader community we […]

The Yahoo! Effect

While much credit has been given to Yahoo! since Hadoop was donated to the Apache Software Foundation in 2006, the real measure of their contributions and the impact that they have had in making Apache Hadoop what it is today is quite substantial. This blog will take a look at Yahoo!’s contributions to Apache Hadoop […]

Overview As the former technical lead for the Yahoo! team that added security to Apache Hadoop, I thought I would provide a brief history. The motivation for adding security to Apache Hadoop actually had little to do with traditional notions of security in defending against hackers since all large Hadoop clusters are behind corporate firewalls […]