The Road Ahead for Hortonworks and Hadoop
I recently delivered a webinar entitled “Hortonworks State of the Union”. For those new to Apache Hadoop, I covered a brief history of Hadoop and Hortonworks’ role within the open source community. We also covered how the platform services, data services, and operational services required to enable Hadoop as an enterprise-viable platform evolved in 2012.
Finally, we discussed the important progress made on deeply integrating Hadoop within next-generation data architectures in a way that makes sense for the enterprise. Our partnership with Teradata provides a great example of how deep integration of BOTH the data services (via Apache HCatalog) AND the operational services (via Apache Ambari’s REST APIs) can deliver value in a way that addresses mainstream enterprise needs while preserving existing investments.
If 2012 was a big year for Hadoop and big data, then 2013 should be HUGE.
As we enter 2013, I believe Hadoop has “crossed the chasm” from a framework for early adopters and technology enthusiasts to a strategic data platform embraced by early majority and pragmatic adopters. CTOs and CIOs across mainstream enterprises want to improve the performance of their companies and unlock new business opportunities, and they realize that including Hadoop as a deeply integrated “plus 1” to their data architectures provides them the fastest path to their goals while maximizing their existing investments.
The other side of the chasm is where vertical solutions (or “bowling pins” as Geoffrey Moore refers to them in his book) emerge in earnest. While we, Hortonworks, are interested in serving the needs of these vertical solutions, as an open source software infrastructure company we are keenly interested in identifying and enabling horizontal patterns of use that unlock Hadoop’s value for the widest range of use cases.
Refine, Explore, Enrich
- Refine is about capturing all sorts of data sources into a platform where that data can then be refined into formats that are more easily shared with downstream systems such as a Data Warehouse.
- Explore is about interactively surfing through these new lakes of data and unlocking opportunities for business value through the use of new and existing Business Intelligence (BI) tools.
- Enrich is about creating and deploying advanced analytics in a way that makes online applications, such as mobile commerce applications, more “intelligent” with respect to the experience delivered.
The key point to reiterate is that Hadoop is an important “plus 1” in next-generation data architectures powering these use cases.
So What’s in Store for 2013?
Our focus from 2012 continues into 2013: a) make Hadoop an enterprise-viable platform that’s easy to use and consume by the enterprise while b) ensuring the platform is interoperable with the broader data ecosystem. With that said, I outlined a range of initiatives that we, Hortonworks, will be focused on in our efforts within the open source community: Interactive Query, Business Continuity (DR, Snapshots, etc.), Secure Access, as well as ongoing investments in Data Integration, Management (i.e. Ambari), and Online Data (i.e. HBase). We will be working in other areas, of course, but these are the key focus areas that our enterprise customers are interested in.
Since the topic of Interactive Query is fairly popular these days, let me share some quick thoughts. Over the past few years, Apache Hive has matured into the de-facto SQL interface to Hadoop data. Many of the top BI vendors support Hive today, and based on our customer interactions, more than 50% of Hadoop use cases depend on Hive for operational data processing and BI use cases. That said, Hive needs work to support human interactive BI use cases such as visualization and parameterized reporting.
Rather than abandon the Apache Hive community, Hortonworks is focused on working in the community to optimize Hive’s ability to serve big data exploration and interactive query in support of important BI use cases. Moreover, we are focused on enabling Hive to take advantage of YARN in Apache Hadoop 2.0, which will help ensure fast query workloads don’t compete for resources with the other jobs running in the cluster. Enabling Hadoop to predictably support enterprise workloads that span Batch, Interactive, and Online use cases is an important area of focus for us.
Over the coming weeks, we will roll out webinars and blog posts that cover each of our initiatives in more detail. Also, we expect to demonstrate some of the fruits of the labor at the Hadoop Summit in Amsterdam in March.
2013 should prove to be a fun and productive year!