The Hortonworks Blog

More from Shaun Connolly

As we approach Hadoop Summit in San Jose next week, the debate continues over where Hadoop really is on its adoption curve. George Leopold from Datanami was one of the first to beat the hornet’s nest with his article entitled Gartner: Hadoop Adoption ‘Fairly Anemic’. Matt Asay from TechRepublic and Virginia Backaitis from CMSWire volleyed back with Hadoop Numbers Suggest the Best is Yet to Come and Gartner’s Dismal Predictions for Hadoop Could Be Wrong, respectively.…

As we are finalizing our preparations for what will surely be another successful Hadoop Summit Europe event, one thing has become unequivocally clear: the Hadoop challenge is no longer about acceptance. It’s no longer about adoption. It’s about Hadoop being pervasive. Hadoop is everywhere.

As Mike Gualtieri of Forrester wrote in a recent report:

Hadoop is a must-have for large enterprises

I couldn’t agree more with Mike’s assessment, and I encourage you to read the report: “Predictions 2015: Hadoop Will Become a Cornerstone of Your Business Technology Agenda”.…

Today EMC is launching their EMC® Business Data Lake solution, the first fully-engineered, enterprise-grade solution for a Data Lake running on EMC infrastructure. At Hortonworks, we’ve been assisting customers on their journey to a data lake via a Modern Data Architecture (MDA) and our vision and EMC’s vision are highly complementary and so we’re delighted to be part of the EMC Business Data Lake.

The Data Lake enabled by a Modern Data Architecture allows enterprises to be a Data-First Enterprise.…

This is a unique moment in time. Fueled by open source, Apache Hadoop has become an essential part of the modern enterprise data architecture and the Hadoop market is accelerating at an amazing rate.

The impressive thing about successful open source projects is the pace of the “release early, release often” development cycle, also known as upstream innovation. The process moves through major and minor releases at a regular clip and the downstream users get to pick the releases and versions they want to consume for their specific needs.…

Since our founding in 2011, Hortonworks has had a fundamental belief: the only way to deliver infrastructure platform technology is completely in open source. Moreover, we believe that collaborative open source software development under the governance model of an entity like the Apache Software Foundation (ASF) is the best way to accelerate innovation that targets enterprise end users since it brings the largest number of developers together in a way that enables innovation to happen far faster than any single vendor could achieve and in a way that is free of friction for the enterprise.…

Since our founding in mid-2011, our vision for Hadoop has been that “half the world’s data will be processed by Hadoop”. With that long-term vision in mind, we focus on the mission to establish Hadoop as the foundational technology of the modern enterprise data architecture that unlocks a whole new class of data-driven applications that weren’t previously possible.

We use what we call the “Blueprint for Enterprise Hadoop” for guiding how we invest in Hadoop-related open source technologies as well as enabling the key integration points that are important for deploying Enterprise Hadoop within a modern data architecture, on-premises or in the cloud, in a way that enables the business and its users to maximize the value from their data.…

There are many projects that have been contributed to the Apache Software Foundation (ASF) by both vendors and users alike that greatly expand Apache Hadoop’s capabilities as an enterprise data platform.

While Hadoop – with YARN at its architectural center – provides the foundational capabilities for managing and accessing data at scale, a broader blueprint for Enterprise Hadoop has emerged that specifies how this array of Apache projects fit across five distinct pillars to form a complete enterprise data platform: data access, data management, security, operations and governance.…

Today we are excited to announce a deepening of our strategic partnership with HP . This news builds on the reseller partnership that we established in 2013 enabling HP to resell the Hortonworks Data Platform. It also allows us to build on the HP AllianceOne ConvergedSystems Partner of the Year Award that we received at the recent HP Discover 2014 conference for our strategic partnership.

Given the rapid adoption of Enterprise Hadoop as a core component of a modern data architecture combined with the fact that HP is the world’s leading server vendor in terms of shipments AND revenues according to IDC – meaning a significant number of those Hadoop nodes are being deployed with HP technologies – it’s hardly surprising that we’ve been collaborating closely.…

Merv Adrian couldn’t have said it better. In his blog post from the weekend, he continued in his quest to define Hadoop. And it is no easy quest as the components of, and evolution of, Hadoop is happening at a pace that is, frankly, astounding.

The continuous evolution of Hadoop has even given rise to sentiments such as ‘Is Hadoop dead? ‘ The answer to that question is YES. And NO.  …

We certainly live in interesting times. About 20 months ago, in an effort to find proprietary differentiation that could be used to monetize and lock in customers to their model, Cloudera unveiled Impala and at that time Mike Olson stated “Our view is that, long-term, this will supplant Hive”. Only 6 months ago in his Impala v Hive post, Olson defended his “decision to develop Impala from the ground up as a new project, rather than improving the existing Apache Hive project” stating “Put bluntly: We chose to build Impala because Hive is the wrong architecture for real-time distributed SQL processing.”

So, 20 months after abandoning Hive and repeated marketing attempts to throw Hive and many other SQL alternatives under the bus in lieu of their “better” approach, I’m certainly puzzled as Cloudera unveils their plan to enable Apache Hive to run on Apache Spark; please see HIVE-7292 for details.…

Today, we announce certification of Apache Spark as YARN Ready. This certification ensures memory and CPU intensive Spark-based applications can co-exist within a single Hadoop cluster with all the other workloads you have deployed. Together, they allow you to use a single cluster with a single set of data for multiple purposes rather than silo your Spark workloads into a separate cluster.

If there’s one thing my interactions with our customers has taught me, it’s that Apache Hadoop didn’t disrupt the datacenter, the data did. The explosion of new types of data in recent years has put tremendous pressure on the datacenter, both technically and financially, and an architectural shift is underway where Enterprise Hadoop is playing a key role in the resulting modern data architecture.

Download our Whitepaper: Hadoop and a Modern Data Architecture.

Due to the flourish of Apache Software Foundation projects that have emerged in recent years in and around the Apache Hadoop project, a common question I get from mainstream enterprises is: What is the definition of Hadoop?

Download our Whitepaper: Hadoop and a Modern Data Architecture.

This question goes beyond the Apache Hadoop project itself, since most folks know that it’s an open source technology borne out of the experience of web scale consumer companies such as Yahoo!, Facebook and others who were confronted with the need to store and process massive quantities of data.…

The Apache Software Foundation (ASF) provides valuable stewardship and guide-rails for projects interested in attracting the broadest community of involvement as possible, especially across a wide range of vendors and end users. While the ASF’s role is not about guaranteeing wild success for every project, they do a great job of providing a place where the broadest community of people, ideas, and code can come together and raise an elephant, so to speak.…

Ever since I was a kid, I’ve used memorable movie quotes to help people understand a key point in a way that lightens the mood and generates some laughs. If you’re going to work hard, you gotta have fun, right???

“Don’t make me angry… you wouldn’t like me when I’m angry”

The big data market is rife with aspirational marketing misinformation, which among other things causes customer confusion, slows the path to value, and frankly, makes me a little angry.…