Balancing Community Innovation and Enterprise Stability
Having worked at JBoss and Red Hat from 2004 to 2008 and SpringSource and VMware from 2008 to 2011, I’ve been focused on the world of open source software for a long while. I’ve been blessed to be able to serve enterprise customer needs with high quality open source software such as JBoss Application Server, Hibernate, Drools, Apache Web Server, Apache Tomcat, Spring … and now Apache Hadoop.
As specific open source technologies mature and their use becomes mainstream, it becomes increasingly important to understand and communicate the balancing act that needs to happen between community innovation and enterprise stability.
Community innovation needs to have a fast pace, where “ship early and often” is a key tenet. Open source projects need to visibly improve and keep innovating if they are to attract a vibrant following. As the open source project’s community grows, they will expect big improvements and will be fine with early, buggy releases, etc. After all, that’s part of the process
On the other hand, widespread enterprise adoption requires stability as a prerequisite. Mainstream enterprises require integrated, tested, and predictable releases on stable source code branches that provide regular sustaining engineering releases. Upgrading to the latest and greatest release is not always a viable option for enterprises widely deploying technology. They need to upgrade in a way that is repeatable and manageable.
On the left of the chasm we have the innovators and early adopters. These folks feel very comfortable with the “ship early and often” mantra that fuels community innovation. The need for stability and predictability becomes increasingly important as mainstream enterprises, in the form of early majority (pragmatists) and late majority (conservatives), begin to adopt the technology.
Can a Workable Balance Between Innovation and Stability be Achieved?
When I worked for SpringSource, we had a team of engineers focused on innovating and maintaining Apache Web Server and Apache Tomcat for both our enterprise customers and the broader community. Jim Jagielski, for example, provided strong leadership for Apache Web Server by helping to ensure regular sustaining engineering releases occurred on the widely deployed and stable branches. Not only did it benefit the community, it was also critical to the 1000’s of organizations betting their business on their use of Apache Web Server.
The release of Apache Tomcat 7.0 provides another great example of how to properly set community and enterprise expectations. Back in 2010, Mark Thomas, committer and release manager for Apache Tomcat, was interviewed regarding the release of Tomcat 7.0. One of the questions posed to Mark was:
“Should I plan to adopt Tomcat 7 for my application and if so, when?”
Mark’s answer was pragmatic and spot on, and I’ve paraphrased him here:
“I think it depends on the level of risk / severity of bugs you are prepared to accept and on which features you want to use. If you want rock solid stability, then stick with Tomcat 6. Release 7.0.0 has way too many major bugs. For production use, I would be watching the release notes very carefully and testing my application thoroughly. The more folks that use Tomcat 7, the faster it will be stable but early adopters can expect to find bugs…it usually takes 6-12 months for a release to become stable.”
What Does Apache Hadoop Need to Do to Achieve This Balance?
As Hadoop moves through its technology adoption lifecycle, enterprises and the broader ecosystem of open source projects and solution providers want [and need] to understand what “stable” versions of Hadoop they should consume and when. This means the process of establishing and communicating clear major lines of open source development is vitally important.
Earlier this year, the Apache Hadoop community took major strides towards this goal by declaring Hadoop 1.0 and Hadoop 2.0 as the two major lines of development.
- Hadoop 1.0 represents the most stable version of Hadoop to date and is the culmination of years of effort and production deployments. This version provides a stable base for enterprises to embrace and ecosystem vendors to build upon.
- Hadoop 2.0 represents a major architectural update across both MapReduce and the Hadoop Distributed File System (HDFS). It recently reached an Alpha release within the community, so it continues to make strong progress towards a final stable release. Once that goal is reached, then the process of ongoing hardening and stabilization can occur so the more pragmatic and conservative adopters can have the confidence to get on board.
At Hortonworks, we continue to make major investments that directly benefit these two major lines of development. This ensures that the Apache code that we build into our Hortonworks Data Platform tracks as closely as possible to what the extended community (open source, end users, and vendors) can see for themselves within the Apache source code repositories.
If we expand this discussion to include Hadoop-related projects, we will see that the challenge in achieving a balance between innovation and stability across multiple independently-run open source projects quickly becomes nontrivial, to say the least.
This complexity provides further reason for why it’s so critical for Apache Hadoop to have clearly established and maintained lines of open source development. Driving clarity around major Hadoop releases gives the broader ecosystem the confidence they need to know they are building on a stable foundation versus unpredictably shifting sands.
The Bottom Line
Achieving balance clearly requires cooperation, understanding, and teamwork across the extended community of open source developers, end users, and solution providers / vendors. At Hortonworks, we are firmly focused on helping each major version of Apache Hadoop successfully move along its technology adoption lifecycle.
I hope you are able to join us at Hadoop Summit where we, and the broader community, will be talking in more detail about the cool features and capabilities across Hadoop 1.0, Hadoop 2.0, and beyond.
And for Geoffrey Moore fans, since he is a keynote speaker at Hadoop Summit, you’ll get to hear his thoughts on Hadoop directly.
~ Shaun Connolly