The Hortonworks Blog

More from Owen O'Malley

With YARN and HDFS at the architectural center, Hadoop has emerged as a key component of any modern data architecture. Today, enterprises utilize Hadoop to store critical datasets and power many of their critical workloads. With this in mind, the services and data within a Hadoop cluster needed to be highly available in face of failures and continue to function while the upgrading to the latest software version.

With the Hortonworks Data Platform (HDP) 2.2, we have enhanced the core platform packaging to put in place support for rolling upgrades of the HDP stack while the cluster is actively servicing users.…

Two weeks ago, Apache ORC became an Apache top-level project within the Apache Software Foundation (ASF). This step represents a major step forward for the project, and it is representative of its momentum been built by a broad community of developers.

What is ORC and why is it useful?

Back in January 2013, we created ORC files as part of the Stinger initiative to massively speed up Apache Hive and improve the storage efficiency of data stored in Apache Hadoop.…

I’ve been working on MapReduce frameworks since mid 2005 (Hadoop’s since the start of 2006) and a fundamental feature has always been incredible throughput to access data, but no ACID transactions. That is changing.

Recently, while working with a customer that is using Apache Hive to process terabytes (and growing quickly) of sales data, they asked how to handle a business requirement to update millions of records in their sales table each day.…

As the original architect of MapReduce, I’ve been fortunate to see Apache Hadoop and its ecosystem projects grow by leaps and bounds over the past seven years.

Today, most of my time is spent as an architect and committer on Apache Hive. Hive is the gateway for doing advanced work on Hadoop Distributed File System (HDFS) and the MapReduce framework. We are on the verge of releasing major improvements to Apache Hive, in coordination with work going on in Apache Tez and YARN.…

In February, we announced the Stinger Initiative, which outlined an approach to bring interactive SQL-query into Hadoop.  Simply put, our choice was to double down on Hive to extend it so that it could address human-time use cases (i.e. queries in the 5-30 second range). So, with input and participation from the broader community we established a fairly audacious goal of 100X performance improvement and SQL compatibility.

Introducing Apache Hive 0.11 – 386 JIRA tickets closed

As representatives of this open, community led effort we are very proud to announce the first release of the new and improved Apache Hive, version 0.11. …

While much credit has been given to Yahoo! since Hadoop was donated to the Apache Software Foundation in 2006, the real measure of their contributions and the impact that they have had in making Apache Hadoop what it is today is quite substantial. This blog will take a look at Yahoo!’s contributions to Apache Hadoop and the impact that those contributions have had on making Apache Hadoop what it is today.…

Overview As the former technical lead for the Yahoo! team that added security to Apache Hadoop, I thought I would provide a brief history.

The motivation for adding security to Apache Hadoop actually had little to do with traditional notions of security in defending against hackers since all large Hadoop clusters are behind corporate firewalls that only allow employees access. Instead, the motivation was simply that security would allow us to use Hadoop more effectively to pool resources between disjointed groups.…