Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
September 19, 2017 | Simon Ball | Announcements

Hortonworks Cybersecurity Platform – Big Data Cybersecurity Solution

September 18, 2017 | Matt Spillar | Hortonworks Case Study

Lloyds Banking Group Brings Home Data Accolade

September 18, 2017 | Vinod Kumar Vavilapalli | From the Dev Team

Engineering @ Hortonworks – The Matrix

Viewing posts by: Owen O'Malley« Back to all


All Topics

All Channels


It was 10 years ago today (Feb 2) that my first patch ( went into the code that two days later became Hadoop ( I had been working on Yahoo Search’s WebMap, which was the back end that analyzed the web for the search engine.  We had been working on a C++ implementation of GFS and MapReduce, but […]

With YARN and HDFS at the architectural center, Hadoop has emerged as a key component of any modern data architecture. Today, enterprises utilize Hadoop to store critical datasets and power many of their critical workloads. With this in mind, the services and data within a Hadoop cluster needed to be highly available in face of failures […]

Two weeks ago, Apache ORC became an Apache top-level project within the Apache Software Foundation (ASF). This step represents a major step forward for the project, and it is representative of its momentum been built by a broad community of developers. What is ORC and why is it useful? Back in January 2013, we created […]

I’ve been working on MapReduce frameworks since mid 2005 (Hadoop’s since the start of 2006) and a fundamental feature has always been incredible throughput to access data, but no ACID transactions. That is changing. Recently, while working with a customer that is using Apache Hive to process terabytes (and growing quickly) of sales data, they […]

As the original architect of MapReduce, I’ve been fortunate to see Apache Hadoop and its ecosystem projects grow by leaps and bounds over the past seven years. Today, most of my time is spent as an architect and committer on Apache Hive. Hive is the gateway for doing advanced work on Hadoop Distributed File System […]

In February, we announced the Stinger Initiative, which outlined an approach to bring interactive SQL-query into Hadoop.  Simply put, our choice was to double down on Hive to extend it so that it could address human-time use cases (i.e. queries in the 5-30 second range). So, with input and participation from the broader community we […]

The Yahoo! Effect

While much credit has been given to Yahoo! since Hadoop was donated to the Apache Software Foundation in 2006, the real measure of their contributions and the impact that they have had in making Apache Hadoop what it is today is quite substantial. This blog will take a look at Yahoo!’s contributions to Apache Hadoop […]

Overview As the former technical lead for the Yahoo! team that added security to Apache Hadoop, I thought I would provide a brief history. The motivation for adding security to Apache Hadoop actually had little to do with traditional notions of security in defending against hackers since all large Hadoop clusters are behind corporate firewalls […]