The Hortonworks Blog

More from Eric Baldeschwieler

As Apache Hadoop has risen in visibility and ubiquity we’ve seen a lot of other technologies and vendors put forth as replacements for some or all of the Hadoop stack. Recently, GigaOM listed eight technologies that can be used to replace HDFS (Hadoop Distributed File System) in some use cases. HDFS is not without flaws, but I predict a rosy future for HDFS.  Here is why…

To compare HDFS to other technologies one must first ask the question, what is HDFS good at:

  • Extreme low cost per byte
    HDFS uses commodity direct attached storage and shares the cost of the network & computers it runs on with the MapReduce / compute layers of the Hadoop stack.

Last week was an important milestone for Hortonworks: our one year anniversary. Given all of the activity around Apache Hadoop and Hortonworks, it’s hard to believe it’s only been one year. In honor of our birthday, I thought I would look back to contrast our original intentions with what we delivered over the past year.

Hortonworks was officially announced at Hadoop Summit 2011. At that time, I published a blog on the Hortonworks Manifesto.…

In Shaun Connolly’s post about balancing community innovation and enterprise stability, he discussed the importance of an open source project forging ahead with big improvements that are expected to be initially buggy and incomplete functionally but then stabilize over time. In the case of Apache Hadoop 2.0, currently in community Alpha release, the big improvements have been underway for the past 3 years and include such things as:

  • Next-gen MapReduce (aka YARN) that opens up Hadoop’s job processing architecture to other application workloads beyond MapReduce,
  • New HDFS pipe-line to support append and flush,
  • HDFS federation and performance improvements that enable Hadoop to scale to 1000’s more nodes in a cluster, and
  • High availability improvements that address some of the single point of failure issues that are often used as examples of how Hadoop may not be as enterprise-ready as it could be.…
  • We are pleased to support today’s announcement from Citrix that they have contributed CloudStack to the Apache community. For those new to CloudStack, it is an open source cloud computing software that helps organizations build and manage cloud infrastructures. It is similar to Amazon Web Services EC2 environment except that it enables organizations to build public, private or hybrid cloud environments using their own pooled computing resources.

    Citrix announced today that they were reaffirming their commitment to open source by working with the Apache Software Foundation to make CloudStack 3 an Apache project, released under Apache Software License 2.0.…

    Thank you to the community members that cast over 8,000 votes during the Hadoop Summit Community Choice voting process. The turnout far exceeded our expectations and is further evidence that the momentum behind Apache Hadoop has never been stronger.

    As we announced, the sessions with the most votes in each track are automatically accepted into the Hadoop Summit agenda. As such, I am pleased to announce the winners of the Hadoop Summit Community Choice vote and the first confirmed sessions in the Hadoop Summit program:

    Future of Apache Hadoop track: Dynamic Namespace Partitioning with Giraffa File System, Konstantin Shvachko (eBay)

    Deployment and Operations track: Dynamic Reconfiguration of Apache Zookeeper, Alexander Shraer and Benjamin Reed (Yahoo!)

    Enterprise Data Architecture track: iMStor: Hadoop Storage-based Tiering Platform, Vishal Malik (Cognizant Technology Solutions)

    Applications and Data Science track: Hadoop & Cloud @Netflix: Taming the Social Data Firehose, Mohammad Sabah (Netflix)

    Analytics and Business Intelligence track: Mapping and Reducing Passenger Turbulence using Big Data, Farhan Hussain and Saad Patel (Open Source Architect)

    Hadoop in Action track: The Merchant Lookup Service at Intuit, Vrushali Channapattan (Intuit)…

    As I first mentioned when we announced Hadoop Summit 2012, we are focused on making Hadoop Summit the preeminent conference for the Apache Hadoop community. Today I’m pleased to tell you about Community Choice, a public online voting system that enables the entire Apache Hadoop community to have a say in the sessions chosen for Hadoop Summit. Anybody can vote and the top vote getters in each track will automatically be included in the Hadoop Summit agenda.…

    Today we announced an important strategic partnership with Talend, provider of the world’s most popular open source data integration platform. This is another win for both Hortonworks customers and the larger Apache Hadoop community. There were two key aspects of the announcement that I wanted to highlight:

    Talend releases Talend Open Studio for Big Data

    Based upon Talend’s very popular open source data integration platform, Talend Open Studio for Big Data adds connectors for HDFS, HBase, Pig, Sqoop and Hive.…

    Today we announced  that we were delivering on our earlier promise to help Microsoft bring Apache Hadoop to Windows. I’m pleased to share that Microsoft, with our collaboration and guidance, has now submitted a series of patches to Apache aimed at overcoming the challenges of running Apache Hadoop in Windows Server environments.

    These patches, once vetted and approved by the community, will become part of the core Hadoop code base. They will also become available in the two major Apache Hadoop branches: hadoop-1.0 (the current stable branch, which is available as part of Hortonworks Data Platform v1.0) and hadoop-0.23 (the next generation of Apache Hadoop, which will be available as part of Hortonworks Data Platform v2.0).…

    I’ve been surprised by a couple of recent articles highlighting our recent leadership change.  These articles imply that our business model may be changing. Let me be clear, WE ARE NOT CHANGING OUR BUSINESS MODEL. We are committed to providing training and support of a 100% open source distribution of Apache Hadoop and related projects.

    What has changed?

    Rob Bearden has agreed to take on the role of CEO. I am moving from CEO to the role of CTO.…

    I am pleased to report that Hortonworks has been named a leader in the recently released Forrester Wave report on Enterprise Hadoop Solutions. We scored well across all three rating areas: current offering, market presence and strategy.

    We appreciate the recognition, particularly this sentence that highlighted our role in the marketplace: ”(Hortonworks) is the technology leader and ecosystem building for the entire Hadoop industry and has recently released its Hortonworks Data Platform, which incorporates purely open-source Apache Hadoop software.”

    Being named a Leader in the Forrester Wave on Enterprise Hadoop Solutions is one of many achievements for Hortonworks over the past seven months (stay tuned for a blog on this topic).…

    I am pleased to announce that Paul Cormier has joined the Hortonworks Board of Directors. Paul is currently President, Products and Technologies at Red Hat, where he leads the company’s engineering and products business units. Paul has an exceptional background in building enterprise-class open source software. He also has helped Red Hat achieve tremendous revenue growth by enabling a rich ecosystem of partners. We are pleased to add such a talented and experienced open source professional to our board.…

    Hi Folks,

    I’m happy to report that Hadoop Summit will be back for it’s 5th year. This year, Hortonworks and Yahoo are jointly hosting the conference, which will take place on June 13th and 14th at the San Jose Convention Center.

    This year’s event promises to be bigger and better than ever. We have extended the conference to a second day, added additional session tracks and expect to showcase even more compelling and useful presentations.…

    Today we announced our plans to release a public preview of the Hortonworks Data Platform (HDP) version 2. HDP v2 will leverage Apache Hadoop 0.23, which is the first major update to Hadoop in more than three years. Among other advancements, HDP v2 will include the NextGen MapReduce architecture, HDFS NameNode HA and HDFS Federation. It will also include the most up-to-date stable components including HCatalog, HBase, Hive and Pig; all fully integrated and tested at scale.…

    I’m pleased to announce that Shaun Connolly has joined our executive management team as VP of Corporate Strategy. Shaun is a veteran enterprise software and open source executive that comes to us from VMware and previously held positions at SpringSource and JBoss.

    As VP of Corporate Strategy, Shaun will be responsible for helping us to achieve our business objectives by guiding corporate strategy and identifying new market opportunities for Apache Hadoop.  …

    I spent some time last week at ApacheCon NA 2011 in Vancouver, BC. It was a good experience and I enjoyed catching up with friends and colleagues involved in the Hadoop project and also meeting some of the executives of the Apache Software Foundation in person. It is clear that the Apache community is thriving and that interest in Hadoop remains very high.

    Hortonworks is committed to supporting Apache and we are pleased to have been a gold sponsor of this event. …

    Go to page:12