Balancing Community Innovation and Enterprise Stability

Having worked at JBoss and Red Hat from 2004 to 2008 and SpringSource and VMware from 2008 to 2011, I’ve been focused on the world of open source software for a long while. I’ve been blessed to be able to serve enterprise customer needs with high quality open source software such as JBoss Application Server, Hibernate, Drools, Apache Web Server, Apache Tomcat, Spring … and now Apache Hadoop.

As specific open source technologies mature and their use becomes mainstream, it becomes increasingly important to understand and communicate the balancing act that needs to happen between community innovation and enterprise stability.

Community innovation needs to have a fast pace, where “ship early and often” is a key tenet.  Open source projects need to visibly improve and keep innovating if they are to attract a vibrant following. As the open source project’s community grows, they will expect big improvements and will be fine with early, buggy releases, etc. After all, that’s part of the process

On the other hand, widespread enterprise adoption requires stability as a prerequisite. Mainstream enterprises require integrated, tested, and predictable releases on stable source code branches that provide regular sustaining engineering releases. Upgrading to the latest and greatest release is not always a viable option for enterprises widely deploying technology. They need to upgrade in a way that is repeatable and manageable.

Let’s use Geoffrey Moore’s famous “crossing the chasm” model to illustrate things from a technology adoption lifecycle perspective. 

On the left of the chasm we have the innovators and early adopters. These folks feel very comfortable with the “ship early and often” mantra that fuels community innovation. The need for stability and predictability becomes increasingly important as mainstream enterprises, in the form of early majority (pragmatists) and late majority (conservatives), begin to adopt the technology.

Can a Workable Balance Between Innovation and Stability be Achieved?

When I worked for SpringSource, we had a team of engineers focused on innovating and maintaining Apache Web Server and Apache Tomcat for both our enterprise customers and the broader community. Jim Jagielski, for example, provided strong leadership for Apache Web Server by helping to ensure regular sustaining engineering releases occurred on the widely deployed and stable branches. Not only did it benefit the community, it was also critical to the 1000’s of organizations betting their business on their use of Apache Web Server.

The release of Apache Tomcat 7.0 provides another great example of how to properly set community and enterprise expectations. Back in 2010, Mark Thomas, committer and release manager for Apache Tomcat, was interviewed regarding the release of Tomcat 7.0. One of the questions posed to Mark was:

Should I plan to adopt Tomcat 7 for my application and if so, when?

Mark’s answer was pragmatic and spot on, and I’ve paraphrased him here:

“I think it depends on the level of risk / severity of bugs you are prepared to accept and on which features you want to use. If you want rock solid stability, then stick with Tomcat 6. Release 7.0.0 has way too many major bugs. For production use, I would be watching the release notes very carefully and testing my application thoroughly. The more folks that use Tomcat 7, the faster it will be stable but early adopters can expect to find bugs…it usually takes 6-12 months for a release to become stable.”

What Does Apache Hadoop Need to Do to Achieve This Balance?

As Hadoop moves through its technology adoption lifecycle, enterprises and the broader ecosystem of open source projects and solution providers want [and need] to understand what “stable” versions of Hadoop they should consume and when. This means the process of establishing and communicating clear major lines of open source development is vitally important.

Earlier this year, the Apache Hadoop community took major strides towards this goal by declaring Hadoop 1.0 and Hadoop 2.0 as the two major lines of development.

  • Hadoop 1.0 represents the most stable version of Hadoop to date and is the culmination of years of effort and production deployments. This version provides a stable base for enterprises to embrace and ecosystem vendors to build upon.
  • Hadoop 2.0 represents a major architectural update across both MapReduce and the Hadoop Distributed File System (HDFS). It recently reached an Alpha release within the community, so it continues to make strong progress towards a final stable release. Once that goal is reached, then the process of ongoing hardening and stabilization can occur so the more pragmatic and conservative adopters can have the confidence to get on board.

At Hortonworks, we continue to make major investments that directly benefit these two major lines of development. This ensures that the Apache code that we build into our Hortonworks Data Platform tracks as closely as possible to what the extended community (open source, end users, and vendors) can see for themselves within the Apache source code repositories.

If we expand this discussion to include Hadoop-related projects, we will see that the challenge in achieving a balance between innovation and stability across multiple independently-run open source projects quickly becomes nontrivial, to say the least.

This complexity provides further reason for why it’s so critical for Apache Hadoop to have clearly established and maintained lines of open source development. Driving clarity around major Hadoop releases gives the broader ecosystem the confidence they need to know they are building on a stable foundation versus unpredictably shifting sands.

The Bottom Line

Achieving balance clearly requires cooperation, understanding, and teamwork across the extended community of open source developers, end users, and solution providers / vendors. At Hortonworks, we are firmly focused on helping each major version of Apache Hadoop successfully move along its technology adoption lifecycle.

I hope you are able to join us at Hadoop Summit where we, and the broader community, will be talking in more detail about the cool features and capabilities across Hadoop 1.0, Hadoop 2.0, and beyond.

And for Geoffrey Moore fans, since he is a keynote speaker at Hadoop Summit, you’ll get to hear his thoughts on Hadoop directly.

~ Shaun Connolly

Categorized by :
Apache Hadoop Hadoop Ecosystem Hadoop in the Enterprise Industry Happenings

Comments

|
June 6, 2012 at 6:56 pm
|

Thanks for the comments Roman!
It’s definitely fun times.
Lots of hard work ahead!

Say hi to PCE for me over there at Cloudera.
I enjoyed working with him at SpringSource/VMware.
So treat him well! ;-)
And ask him for some free maple syrup…he has a farm in Vermont that makes kickazz maple syrup.

|
June 6, 2012 at 6:01 pm
|

Hi Cos,

Thanks for the comment. The “asparagus chart”, as I like to refer to it, was conceived in 2006 by Jon Atkins (who I worked with at JBoss / Red Hat and is still at Red Hat). The chart was used to describe how the various open source parts of the JBoss platform were integrated and tested together. So the credit for the concept goes to Jon Atkins.

The point of the graphic is that any nontrivial open source platform (that has to corral a bunch of technologies into one tested whole) suffers the same problem; Hadoop is not unique in this. Actually Linux has this same problem in spades.

As far as BigTop goes, we at Hortonworks are using parts of BigTop for the HDP platform builds, so thanks for the efforts there!

At the end of the day, my post was more about the importance of managing expectations across community innovation and enterprise stability. This is actually more of a people and perspective issue than a technology/tool issue.

    |
    June 6, 2012 at 6:33 pm
    |

    Shaun, just to clarify — our mission at Apache Bigtop is not to simply be a place where packaging code is developed, but to be a cutting edge upstream Hadoop ecosystem integration platform. That is why we cover a full integration lifecycle: build, packaging, deployment and validation. Here’s a link providing more detailed overview: http://bit.ly/KI0aXz

    You’re absolutely right in saying that Linux has had this very problem for years and that’s precisely why every Linux enterprise company is now managing customer expectations via having two artifacts — and enterprise distribution and an cutting edge, community-driven integration platform: RHEL/Fedora, SLES/OpenSUSE, etc (we even tried it at Sun with Solaris/OpenSolaris, but that’s a different story). It is my strong belief that Apache Hadoop ecosystem will benefit from taking a page out of the Linux book here.

    It is my sincere hope, that Apache Bigtop will prove to be just such an artifact. With Cloudera, HortonWorks, Trend Micro and a couple of other big players already on board and using Bigtop to manage their enterprise-grade distributions of Hadoop the momentum clearly seems to be there. Now it is up to the Bigtop community to rise up to the challenge and be as viable as Fedora, OpenSuse and Debian distributions are for the Linux crowd.

    Thanks for articulating the problem and best of luck with HDP!

Cos
|
June 6, 2012 at 4:18 pm
|

It really would be nice not to forget to give a credit where it is due. The very last picture in the article refers to the vision . This is what has been eventually implemented by Apache BigTop for quite some time now. And of course the concept behind of BigTop is taken from the presentation linked from this post http://is.gd/06yvPM.
But at the end, we all working towards the success of the open source, so perhaps, the credit isn’t really due.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Join the Webinar!

YARN Ready – Integrating to YARN using Slider (part 2 of 3)
Thursday, August 7, 2014
12:00 PM Eastern / 9:00 AM Pacific

More Webinars »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
HDP 2.1 Webinar Series
Join us for a series of talks on some of the new enterprise functionality available in HDP 2.1 including data governance, security, operations and data access :
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.