The Hortonworks Blog

Posts categorized by : Apache Hadoop

I’m pleased to announce the first in a series of videos featuring Hortonworks founders and executives sharing their thoughts on how Apache Hadoop is being extended to become the next generation enterprise data platform. Over the coming weeks and months, you will be hearing from folks such as Matt Foley, Arun Murthy, Sanjay Radia and Alan Gates, just to name a few.

The first video features Shaun Connolly, Hortonworks VP of Corporate Strategy, talking about the Hortonworks vision for Apache Hadoop.…

We reached a significant milestone in HDFS: the Namenode HA branch was merged into the trunk. With this merge, HDFS trunk now supports HOT failover.

Significant enhancements were completed to make HOT Failover work:

  • Configuration changes for HA
  • Notion of active and standby states were added to the Namenode
  • Client-side redirection
  • Standby processing journal from Active
  • Dual block reports to Active and Standby

We have extensively tested HOT manual failover in our labs over the last few months.…

Today we announced  that we were delivering on our earlier promise to help Microsoft bring Apache Hadoop to Windows. I’m pleased to share that Microsoft, with our collaboration and guidance, has now submitted a series of patches to Apache aimed at overcoming the challenges of running Apache Hadoop in Windows Server environments.

These patches, once vetted and approved by the community, will become part of the core Hadoop code base. They will also become available in the two major Apache Hadoop branches: hadoop-1.0 (the current stable branch, which is available as part of Hortonworks Data Platform v1.0) and hadoop-0.23 (the next generation of Apache Hadoop, which will be available as part of Hortonworks Data Platform v2.0).…

A very short while ago, Vinod blogged about some of the significant improvements in Hadoop.Next (a.k.a hadoop-0.23.1).

To recap, the Hortonworks and Yahoo! teams have done a huge amount of work to test, validate and benchmark Hadoop.Next, the next generation of Apache Hadoop that includes HDFS Federation, NextGen MapReduce (a.k.a. YARN) and many other significant features and performance improvements.

Today, I’m very excited to announce that the Apache Hadoop community voted to release hadoop-0.23.1 and it’s now available for all to use!…

I’ve been surprised by a couple of recent articles highlighting our recent leadership change.  These articles imply that our business model may be changing. Let me be clear, WE ARE NOT CHANGING OUR BUSINESS MODEL. We are committed to providing training and support of a 100% open source distribution of Apache Hadoop and related projects.

What has changed?

Rob Bearden has agreed to take on the role of CEO. I am moving from CEO to the role of CTO.…

Hadoop RPC is the primary communication mechanism between the nodes in an Apache Hadoop cluster. Maintaining wire compatibility, as new features are added to Apache Hadoop, has been a significant challenge with the current RPC architecture. In this blog, I highlight the architectural improvement in Hadoop RPC and how it enables wire compatibility and rolling upgrades.

Challenges for Wire Compatibility

Earlier Hadoop RPC used Writable serialization that made it difficult to evolve the protocols while maintaining wire compatibility.…

For those of you new to Apache ZooKeeper, it is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. To learn more about ZooKeeper, please visit the Apache ZooKeeper homepage.

As part of stabilizing Apache ZooKeeper 3.4 branch, ZooKeeper 3.4.3 has just been released. It is a bug fix release on the 3.4 branch and fixes 17 issues out of which 1 is very critical and can cause data inconsistency (ZOOKEEPER-1367).…

In our previous blogs and webinars we have discussed the significant improvements and architectural changes coming to Apache Hadoop .Next (0.23). To recap, the major ones are:

  • Federation for Scaling HDFS – HDFS has undergone a transformation to separate Namespace management from the Block (storage) management to allow for significant scaling of the filesystem. In previous architectures, they were intertwined in the NameNode.
  • NextGen MapReduce (aka YARN) – MapReduce has undergone a complete overhaul in hadoop-0.23, including a fundamental change to split up the major functionalities of the JobTracker, resource management and job scheduling/monitoring into separate daemons.

Today we announced our plans to release a public preview of the Hortonworks Data Platform (HDP) version 2. HDP v2 will leverage Apache Hadoop 0.23, which is the first major update to Hadoop in more than three years. Among other advancements, HDP v2 will include the NextGen MapReduce architecture, HDFS NameNode HA and HDFS Federation. It will also include the most up-to-date stable components including HCatalog, HBase, Hive and Pig; all fully integrated and tested at scale.…

Congratulations! The Hadoop Community has given itself a big holiday present: Release 1.0.0! This release has been six years in the making, and has involved:

  • Hard work and cooperation from dozens of software developers and contributors from across the industry, including of course Doug Cutting and Mike Cafarella’s early work in Nutch and the founding Hadoop team at Yahoo, Doug, Owen O’Malley and many others, with leadership from Eric14.  Special thanks to all the Hadoop committers.
Motivation

Apache Hadoop provides a high performance native protocol for accessing HDFS. While this is great for Hadoop applications running inside a Hadoop cluster, users often want to connect to HDFS from the outside. For examples, some applications have to load data in and out of the cluster, or to interact with the data stored in HDFS from the outside. Of course they can do this using the native HDFS protocol but that means installing Hadoop and a Java binding with those applications.…

As the Release Manager, it’s my privilege to present Apache Hadoop 0.23:

Release: http://hadoop.apache.org/common/releases.html Documentation: http://hadoop.apache.org/common/docs/r0.23.0/

I’ll present a short overview of the release in this post, more details are available in my recent talk on Apache Hadoop 0.23 at Hadoop World, 2011.…

We’ve been looking for the elephant in the room for some time. We knew he was there, but we just couldn’t find him. It’s clear that he is now here and his name is Hortonworks. As such, we are very excited to announce today that Index Ventures has made an investment in Hortonworks.

The elephant toy – Hadoop – has become a household name in the Big Data sector these days and we’ve been tracking it for some time at Index.…

I spent some time last week at ApacheCon NA 2011 in Vancouver, BC. It was a good experience and I enjoyed catching up with friends and colleagues involved in the Hadoop project and also meeting some of the executives of the Apache Software Foundation in person. It is clear that the Apache community is thriving and that interest in Hadoop remains very high.

Hortonworks is committed to supporting Apache and we are pleased to have been a gold sponsor of this event. …

As the framework architects and developers of Apache Hadoop MapReduce, we are always looking for ways to simplify the complex tasks associated with large-scale processing of data. We want users and organizations to spend their time on analyzing their growing data to gain valuable insights, not on menial tasks such as massaging their data for consumption or tediously parsing complex structures in their data. The Informatica HParser technology is extremely valuable in this regard.…

Go to page:« First...1020...2425262728