Hadoop Ecosystem

While much credit has been given to Yahoo! since Hadoop was donated to the Apache Software Foundation in 2006, the real measure of their contributions and the impact that they have had in making Apache Hadoop what it is today is quite substantial. This blog will take a look at Yahoo!’s contributions to Apache Hadoop and the impact that those contributions have had on making Apache Hadoop what it is today.…

We are glad to have branched for a hadoop-0.23 release. We have already talked about some of the significant enhancements coming in the upcoming release such as HDFS Federation and NextGen MapReduce and we are excited to be starting the journey to begin stabilizing the next release. Please check out this presentation for more details.

As always, this is a community effort and we are very thankful for all the contributions from the Apache Hadoop community.…

Delegation tokens play a critical part in Apache Hadoop security, and understanding their design and use is important for comprehending Hadoop’s security model.

Download our technical paper on adding security to Hadoop here.

Authentication in Apache Hadoop
Apache Hadoop provides strong authentication for HDFS data. All HDFS accesses must be authenticated:

1. Access from users logged in on cluster gateways
2. Access from any other service or daemon (e.g. HCatalog server)

As the former technical lead for the Yahoo! team that added security to Apache Hadoop, I thought I would provide a brief history.

The motivation for adding security to Apache Hadoop actually had little to do with traditional notions of security in defending against hackers since all large Hadoop clusters are behind corporate firewalls that only allow employees access. Instead, the motivation was simply that security would allow us to use Hadoop more effectively to pool resources between disjointed groups.…

As enterprises increasingly adopt Apache Hadoop for critical data, the need for high-quality releases of Apache Hadoop becomes even more crucial. Storage systems in particular require robustness and data integrity since enterprises cannot tolerate data corruption or loss. Further, Apache Hadoop offers an execution engine for customer applications that comes with its own challenges. Apache Hadoop handles failures of disks, storage nodes, compute nodes, network and applications. The distributed nature, scale and rich feature set makes testing Apache Hadoop non-trivial.…

For the first time in its history, OSCON, the premier open-source conference, had a special OSCON Data sub-conference. Apache Hadoop had a full track dedicated to it at OSCON Data. This clearly was indicative of the interest in Big Data and the central role Apache Hadoop plays in the space. A special shout out to Bradford Stephens and Sarah Novotny, the program chairs, who did a fantastic job with OSCON Data.…

