Separating Open Source Signal from Enterprise Hadoop Noise

There have been many Apache Hadoop-related announcements the past few weeks, making it difficult to separate the signal from the marketing noise. One thing is crystal clear however… there is a large and growing appetite for Enterprise Hadoop because it helps unlock new insights and business opportunities in a way that was not previously technologically or economically feasible.

Enterprise and Open Source are NOT Mutually Exclusive

forbesWoodsDan Woods from Forbes, recently penned an article entitled Why SQL Matters, the Limits of Open Source, and Other Lessons of EMC Greenplum’s Pivotal HD” where he paints a picture of enterprise and open source in opposite corners. As an example, he closes his article with:

 “If you are a CIO what do you choose? Open source ideology or products that are made to solve enterprise problems by enterprise companies?”

I take issue with that either/or stance; just look at Red Hat, JBoss, SpringSource, MySQL as well as the broad enterprise use of Apache Web Server and Apache Tomcat for examples of enterprise-class open source software. Our approach at Hortonworks is very much about providing a healthy mix of enterprise AND open source – with emphasis on the “AND”.  Specifically, we identify and introduce enterprise requirements into the pubic domain (i.e. open source), we work with the community and partners to advance and incubate open source projects, and we apply enterprise rigor to provide the most stable and reliable distribution that our customers and partners can rely on.

While I take issue with the sentiment of the Forbes article, I agree with one of its thematic points: in order for Hadoop to flourish, it needs to factor in traditional enterprise “use-value participants”.

At Hortonworks, we work very closely with Teradata and Microsoft as “use-value participants” (to use the Forbes term) that are highly relevant to enterprise customers adopting big data strategies.

Why? For Enterprise Hadoop to be as impactful as it can be, our approach to the market needs to be BOTH direct and indirect. Working with partners like Teradata and Microsoft helps pull Enterprise Hadoop into the market in ways that are meaningful and valuable to enterprise customers.

Spotlight: Microsoft Adds Value By Working WITHIN The Community

Hortonworks and Microsoft engineers have worked side-by-side within the Apache community for the past 16 months. The focus has been on making Enterprise Hadoop easier to use and consume by mainstream enterprises. Specifically, the focus has been on Apache Hadoop and more recently Apache Hive (a la our Stinger Initiative aimed at making Hive 100X faster. We’ve also collaborated on making Hadoop applications faster and more secure by introducing new incubator projects such as Apache Tez and Apache Knox Gateway.

windoweleMoreover, a great example of the fruits of our joint efforts is our recent launch of the Hortonworks Data Platform for Windows, aimed at bringing the power of Hadoop to the large Windows ecosystem.

My point here is that Microsoft engineers have been spending serious time and energy working within the Apache Software Foundation on making various open source projects better.  A perfect example of this is a fact that many people may not be aware of. Chris Douglas, an engineer from Microsoft, was recently voted the V.P. of Hadoop. Chris earned this position by demonstrating leadership within the community.

We Feel One Of The Elephants Is Not Like The Others

By now, you’ve gotten the point that we believe enterprise and open source are NOT mutually exclusive. There are go-to-market approaches that can propagate or dispel this myth, however.

  1. Fork / Fragment: One approach is to forego working within the open source community and simply choose to harvest the open source work of others and then modify/bend that technology for specific commercial interests. Changes to the open source technology are intentionally done outside of the community and held back as “important enterprise value add”.  EMC and their Pivotal HD offering is an example of a strategy aimed at fragmenting the market in order to control a portion of the potential customers. See my recent blog post for more thoughts on this topic.
  2. Unite / Coalesce: Another approach is to work within the community on making the open source projects better and more capable of integrating seamlessly with enterprise-focused commercial offerings. Contributing all “value add” changes that should be in the open source projects directly into those projects helps ensure they become easier to use and consume by all. This approach is intended to enable a very large ecosystem to form around a common and consistent open source foundation. Hortonworks partnerships with Teradata and Microsoft are examples of how enterprise-focused solutions can be built on a common and interoperable base.

Both approaches are certainly valid…but with different consequences not only for the technology, but also the broader market / ecosystem. How so? Well, I will simply leave it as an exercise to you, the reader, to consider lessons learned from the UNIX wars (fragmented market) versus Linux (unified market on top of common Linux kernel).

At Hortonworks we are clearly encouraging the second approach, and we are excited to work with partners like Microsoft and others to add value directly into the open source projects in ways that make them easier to use and consume by enterprises.

We also believe that any company that thinks they are “all in” on making open source Apache Hadoop into an enterprise-viable platform needs to have key committers working on the open source technologies (Hortonworks has 50+ committers) or partner with a company like Hortonworks who is focused on working with the ecosystem on ensuring Hadoop integrates and interoperates well with existing enterprise systems and tools.

There Is Still Much Work To Be Done…So Join Us On The Journey!

Hortonworks engineers have been privileged to help Hadoop mature from the domain of a small number of web monsters (including Yahoo!) to a technology that has crossed the chasm and onto a large number of CIO’s agendas across mainstream enterprises. And as I noted in a recent blog post, there is an interesting road ahead of us.

The rise of Enterprise Hadoop offers a refreshing opportunity for our customers to benefit from a data platform that provides a compelling combination of technology, economic and business benefits. And delivering that enterprise value directly as well as indirectly through partners is what we are focused on.

Categorized by :
Hadoop in the Enterprise Other

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.

Thank you for subscribing!