At Hortonworks, our strategy is founded on the unwavering belief in the power of community driven open source software. In the spirit of openness, we think it’s important to share our perspectives around the broader context of how Apache Hadoop and Hortonworks came to be, what we are doing now, and why we believe our unique focus is good for Apache Hadoop, the ecosystem of Hadoop users, and for Hortonworks as well.
The core team here at Hortonworks started at Yahoo! where in 2005 Eric Baldeschwieler (aka “E14” and Hortonworks CTO) challenged Owen O’Malley (Hortonworks co-founder) and several others to solve a really hard problem: store and process the data on the internet in a simple, scalable and economically feasible way. They looked at traditional storage approaches but quickly realized they just weren’t going to work for the type of data (much of it unstructured) and the sheer quantity Yahoo! would have to deal with.
The team’s first reaction, as is the norm, was to lock themselves in a room and come up with a prototype of a closed, proprietary system. With fantastic vision and oversight from E14 and Raymie Stata (former CTO, Yahoo), however, the team turned to the open-source community and in particular the Apache Software Foundation. This also included growing a large development team that included Doug Cutting, Arun Murthy (Hortonworks co-founder) and others who began to work with the community on what became known as Apache Hadoop – specifically HDFS and MapReduce.
The team quickly realized that by contributing their efforts into a community of like-minded individuals, the technology would innovate far faster. At the same time, they’d enable other organizations to realize some of the same benefits that they were starting to see from their early efforts. When organizations such as Facebook, LinkedIn, eBay, Powerset, Quantcast and others began picking up Hadoop and innovating in areas beyond the initial focus, it reinforced the fact that the choice of community driven open source was the right one.
A case in point being when a small startup (Powerset) started working on a project to support tables on HDFS inspired by Google’s BigTable paper; that effort turned into what’s now Apache HBase! Need more? Facebook started an effort to build a SQL layer on top of MapReduce, which became Apache Hive!
Simply put: we believe the fastest way to innovate is to do our work within the open source community, introduce enterprise feature requirements into that public domain, and to work diligently to progress existing open source projects and incubate new projects to meet those needs.
Like anything done in a big group, at times it can be a challenge, but it has proven time and again when it comes to platform technologies like Hadoop that community-driven open source will always outpace the innovation of a single group of people or single company.
Apache Hadoop usage at Yahoo! has grown to the point that today Hadoop is a foundational technology underlying a wide range of business-critical applications. This is captured really well by Sumeet Singh, a Director of Product Management at Yahoo!, who recently outlined just how far their journey has come.
And as the team tasked with architecting and operating that infrastructure over many of those years, our Hortonworks engineers gained critical insights that have been diligently funneled back into the community to be addressed in the appropriate place: the open source projects at the Apache Software Foundation. That process gave rise to a host of new projects that are now core to Hadoop (such as Apache Hadoop YARN, Apache HCatalog, Apache Ambari to go along with Apache Pig, Apache Hive, Apache HBase and many others).
After many years architecting and operating the Hadoop infrastructure at Yahoo! and contributing heavily to the open source community, E14 and 20+ Hadoop architects and engineers spun out of Yahoo! to form Hortonworks in 2011. Having seen what it could do for Yahoo, Facebook, eBay, LinkedIn and others, our singular objective is to focus on making Apache Hadoop into a platform that is easy to use and consume by the broader market of enterprise customers and partners.
And in doing so we maintain that same unwavering view as to how to approach the challenge:
To help us determine where to focus efforts, we spend a lot of time working with Hadoop users to understand the requirements for broader enterprise adoption, examples of which fall into the following categories:
Today, eight years into its development, there are numerous open source projects that augment core Hadoop to address these critical operational, data and platform requirements. Hortonworks Data Platform (HDP) packages up a dozen or so distinct open source projects into a single integrated distribution that provides the enterprise services businesses can rely on. Not only do Hortonworkers play key roles in the test and release process for each of those various projects, but we also take great pains to test and certify a consolidated distribution on large and complex clusters running across a range of operating platforms.
In fact, before we release any version of HDP, we first work with our colleagues at Yahoo! to test it at scale on their infrastructure – every time. This means that by the time HDP sees any customer environment it has been validated at Yahoo!, which has arguably the richest test suite for Hadoop on the planet. Case in point – with help from Yahoo, YARN has been significantly battle-tested – to the tune of nearly 14 million applications and 80,000 jobs per day per cluster.
Our mission when we started Hortonworks was to accelerate the adoption of Hadoop by providing a 100% open source, enterprise grade distribution in order to provide a truly open platform. The key reason partners such as Microsoft and Teradata choose Hortonworks as their strategic partner for Hadoop is this: our engineers are committed to working within the 100% open source Apache Software Foundation projects with no commercial holdbacks. This is really in contrast to other vendors who are taking a proprietary approach that can lead to closed interfaces and vendor lock-in.
And we ensure that the work we do with our partners makes it back into the community. For instance, our work on the Apache HCatalog project has been adopted and extended by Teradata with their SQL-H offering. And we have worked extensively with Microsoft to enable Hadoop to run on Windows, and contributed this work back to the broad community so that others can pick up and continue the work in ways that benefit everyone. Even better, it is really great to see partners like Microsoft contribute significantly to the open-source project to ensure Apache Hadoop is fully supported on key platforms like Microsoft Azure – another illustration of the rising tide that is the open-source model.
Good for Hortonworks
We are pretty passionate about the journey we are on. By staying true to our 100% open source philosophy and applying Enterprise software rigor to the test and release process, we believe that we can accelerate the adoption of Hadoop in the ecosystem.
We love what we are doing, are committed to the approach, and can’t wait to see what the next chapter brings.