The Hortonworks Blog

Posts categorized by : MapReduce

In Shaun Connolly’s post about balancing community innovation and enterprise stability, he discussed the importance of an open source project forging ahead with big improvements that are expected to be initially buggy and incomplete functionally but then stabilize over time. In the case of Apache Hadoop 2.0, currently in community Alpha release, the big improvements have been underway for the past 3 years and include such things as:

  • Next-gen MapReduce (aka YARN) that opens up Hadoop’s job processing architecture to other application workloads beyond MapReduce,
  • New HDFS pipe-line to support append and flush,
  • HDFS federation and performance improvements that enable Hadoop to scale to 1000’s more nodes in a cluster, and
  • High availability improvements that address some of the single point of failure issues that are often used as examples of how Hadoop may not be as enterprise-ready as it could be.
  • If you haven’t yet noticed, we have made Hortonworks Data Platform v1.0 available for download from our website. Previously, Hortonworks Data Platform was only available for evaluation for members of the Technology Preview Program or via our Virtual Sandbox (hosted on Amazon Web Services). Moving forward and effective immediately, Hortonworks Data Platform is available to the general public.

    Hortonworks Data Platform is a 100% open source data management platform, built on Apache Hadoop.…

    I wanted to take this opportunity to share some important news. Today, Hortonworks announced version 1.0 of the Hortonworks Data Platform, a 100% open source data management platform based on Apache Hadoop. We believe strongly that Apache Hadoop, and therefore, Hortonworks Data Platform, will become the foundation for the next generation enterprise data architecture, helping companies to load, store, process, manage and ultimately benefit from the growing volume and variety of data entering into, and flowing throughout their organizations.…

    The following press release was issued by Hortonworks today.

    Hortonworks Announces General Availability of Hortonworks Data Platform

    Industry’s First Apache Hadoop-based Platform to Include Management, Monitoring and Comprehensive Data Services, Making Hadoop Easy to Consume and Use in Enterprise Environments

    As the release manager for the Apache Hadoop 2.0 release, it gives me great pleasure to share that the Apache Hadoop community has just released Apache Hadoop 2.0.0 (alpha)! While only an alpha release (read: not ready to run in production), it is still an important step forward as it represents the very first release that delivers new and important capabilities, including:

    The third installment of the Hortonworks executive video series features Arun C. Murthy, co-founder of Hortonworks and VP of Apache Hadoop for the Apache Software Foundation. In this video, Arun shares his view of the power of Apache Hadoop and provides some insight into the future direction of MapReduce, including the ability to support alternate programming paradigms.

    A very short while ago, Vinod blogged about some of the significant improvements in Hadoop.Next (a.k.a hadoop-0.23.1).

    To recap, the Hortonworks and Yahoo! teams have done a huge amount of work to test, validate and benchmark Hadoop.Next, the next generation of Apache Hadoop that includes HDFS Federation, NextGen MapReduce (a.k.a. YARN) and many other significant features and performance improvements.

    Today, I’m very excited to announce that the Apache Hadoop community voted to release hadoop-0.23.1 and it’s now available for all to use!…

    In our previous blogs and webinars we have discussed the significant improvements and architectural changes coming to Apache Hadoop .Next (0.23). To recap, the major ones are:

    • Federation for Scaling HDFS – HDFS has undergone a transformation to separate Namespace management from the Block (storage) management to allow for significant scaling of the filesystem. In previous architectures, they were intertwined in the NameNode.
    • NextGen MapReduce (aka YARN) – MapReduce has undergone a complete overhaul in hadoop-0.23, including a fundamental change to split up the major functionalities of the JobTracker, resource management and job scheduling/monitoring into separate daemons.

    Today we announced our plans to release a public preview of the Hortonworks Data Platform (HDP) version 2. HDP v2 will leverage Apache Hadoop 0.23, which is the first major update to Hadoop in more than three years. Among other advancements, HDP v2 will include the NextGen MapReduce architecture, HDFS NameNode HA and HDFS Federation. It will also include the most up-to-date stable components including HCatalog, HBase, Hive and Pig; all fully integrated and tested at scale.…

    Congratulations! The Hadoop Community has given itself a big holiday present: Release 1.0.0! This release has been six years in the making, and has involved:

    • Hard work and cooperation from dozens of software developers and contributors from across the industry, including of course Doug Cutting and Mike Cafarella’s early work in Nutch and the founding Hadoop team at Yahoo, Doug, Owen O’Malley and many others, with leadership from Eric14.  Special thanks to all the Hadoop committers.

    As the Release Manager, it’s my privilege to present Apache Hadoop 0.23:

    Release: Documentation:

    I’ll present a short overview of the release in this post, more details are available in my recent talk on Apache Hadoop 0.23 at Hadoop World, 2011.…

    As the framework architects and developers of Apache Hadoop MapReduce, we are always looking for ways to simplify the complex tasks associated with large-scale processing of data. We want users and organizations to spend their time on analyzing their growing data to gain valuable insights, not on menial tasks such as massaging their data for consumption or tediously parsing complex structures in their data. The Informatica HParser technology is extremely valuable in this regard.…

    There has been a lot of progress on hadoop-0.23. We’re continuing to crank through issues as we get ready to ship.

    We are mostly past the initial challenges of moving our entire build infrastructure to Maven. Many thanks to Alejandro, Tom, Giri & Eric Yang for making it happen.

    HDFS is nearly there:

    • HDFS Federation and Client-side mount tables have been tested with ~300 node clusters with security on.
    • HDFS upgrades have been tested from 0.20.2xx.

    We are glad to have branched for a hadoop-0.23 release. We have already talked about some of the significant enhancements coming in the upcoming release such as HDFS Federation and NextGen MapReduce and we are excited to be starting the journey to begin stabilizing the next release. Please check out this presentation for more details.

    As always, this is a community effort and we are very thankful for all the contributions from the Apache Hadoop community.…

    We are very excited to announce NextGen Apache Hadoop MapReduce is getting close. We just merged the code base to Apache Hadoop mainline and Arun is about to branch a hadoop-0.23 to prepare for a release!

    We’ve talked about NextGen Apache Hadoop MapReduce and it’s advantages. The drawbacks of current Apache Hadoop MapReduce are both old and well understood. The proposed architecture has been in the public domain for over 3 years now.…

    Go to page:123