At Hortonworks, we are always watching emerging trends in the datacenter to find opportunities for deeper ecosystem integration with Apache Hadoop in simple and intuitive ways. We first partnered with OpenShift by Red Hat earlier this year when we made it possible to call out to Hadoop services from OpenShift via cartridges. You can read more about that solution here. As Enterprise Cloud (e.g. PaaS) offerings have matured to support a broad set of workloads, we’ve had a number of our customers ask about how Hadoop-centered Big Data and PaaS initiatives could work together – particularly in light of Apache Hadoop YARN being the multi-workload resource manager for batch, interactive and real-time workloads on Hadoop. Docker and Google Kubernetes have rapidly growing communities and expanding awareness – even Microsoft added some Kubernetes support for running on Azure. However, another partner’s participation also caught our attention – OpenShift by Red Hat was moving to make these technologies the core of their next generation PaaS platform.
To us, it seemed like a great opportunity to help bring the two worlds together for our customers – Hadoop and PaaS – and ensure that with YARN we can provide:
Our strategy for making this happen is to work closely in the open source community to develop these new capabilities upstream to drive innovation and integration that can then be brought to market in a stable and tested manner. We share this strategy with Red Hat and are working together to integrate YARN into the Kubernetes pluggable scheduler position as an option found in OpenShift v3.
In the world of PaaS, there is a rapid shift from legacy, heavyweight virtual machines to lightweight and secure containers. Red Hat is a leading contributor to the Docker community, and OpenShift v3 is an exciting new initiative by Red Hat to provide a unified DevOps experience built on a best-of-breed technology stack leveraging Project Atomic, Docker and Google Kubernetes aligned with an intuitive user experience. OpenShift has managed to preserve its developer workflow while commoditizing its architecture to align with industry standards. In one simple motion OpenShift has grown its reference architecture to include that of both the Docker and Google communities. OpenShift has become an epicenter for how PaaS use cases are instrumented on these highly coveted technologies. This consolidation of open source technology initiatives promises to change the way applications are built and deployed in a PaaS environment.
In today’s Hadoop deployments, we at Hortonworks see very large clusters spanning thousands of machines and petabytes of data on commodity hardware in the customer datacenter. As organizations scale their Hadoop deployments, they want to run more analytic applications with different data access paradigms – batch, interactive, real-time, streaming etc. all that need to access the data simultaneously. These data lake deployments are enabled by a modern data architecture powered by YARN, to provide a robust and comprehensive solution for the most demanding Hadoop environments. In addition to powering a rich variety of Hadoop processing engines, YARN is being embraced by key industry-leading analytic software vendors to leverage and extract compute and data resources from existing Hadoop clusters and extend Hadoop with very rich analytic capabilities.
So far, enterprises have deployed and managed separate infrastructures for Hadoop and PaaS. This leads to fragmentation of compute resources and infrastructure silos with duplicate provisioning, management and monitoring tools. By leveraging Hadoop YARN as the underlying resource management infrastructure for both workloads we get obvious benefits of seamlessly sharing resources.
Imagine this: as you roll out a seasonal campaign to customers wouldn’t it be nice to temporarily, and painlessly, borrow a few resources from your data workloads via YARN and then return after a few days or weeks. The alternative today, as many IT departments are painfully aware, is they need to plan in advance, procure new hardware and then decommission at the end of the campaign a few weeks later – very involved indeed! This seriously affects agility and speed at which enterprises can bring new products and services to the market – YARN and OpenShift come to the rescue.
From the lens of data workers in the Hadoop world – this integration provides a very important capability of leveraging OpenShift to present the insights teased from their datasets in simple and intuitive ways. For example, one can now use Pig or Hive to cleanse data, to build models and then immediately turn that analysis into an interactive web application by deploying Shiny in OpenShift. This allows data to turn into actionable insight seamlessly!
By integrating OpenShift with Hadoop YARN, Red Hat and Hortonworks customers can now benefit from:
Integrating OpenShift by Red Hat, Google Kubernetes, and Docker with Apache Hadoop YARN provides tremendous benefits to customers using OpenShift and Hadoop. It is a great example of the openness and vision shown by Red Hat and how Hadoop YARN is emerging as an intelligent scheduler plugin to OpenShift in the datacenter and public cloud by helping drive this community-driven innovation.
We hope you will join us on this exciting journey!!