Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button

The Open Hybrid Architecture Initiative

Bring Cloud Native to On-Premises


The Open Hybrid Architecture initiative – the last mile of our endeavor to deliver on the promise of hybrid is a broad effort across the open-source communities, the partner ecosystem and Hortonworks platforms to enable a consistent experience by bringing the cloud architecture on-premises for the enterprise. A consistent architecture allows customers to seamlessly move data and workloads across on-premises and multiple clouds using platforms such as Hortonworks DataPlane Service.

In addition, through this initiative, we deliver an architecture where it does not matter where your data is – in any cloud, on-premises or the edge. Enterprises can leverage open-source analytics in a secure and governed manner. The benefits of ensuring a consistent architecture provide the key to unlocking a seamless experience. And we make this possible by focusing on the following environments: Storage, Compute, Workloads, and Security and Governance.

Blog: Introducing The Open Hybrid Architecture Initiative

Blog: Open Hybrid Architecture - Bringing Cloud Native to On-Prem


Storage Environment: Decouple storage in a containerized world

Apache Hadoop Ozone or O3 is the next generation object storage that is the foundation of the Open Hybrid Architecture Initiative and it is designed to:

  • Scale to trillions of files, thousands of nodes to address the large amounts of data created by connected devices or originated in the cloud
  • Consolidate tiers of secondary storage including Apache Hadoop, archive, and backup
  • Speak multiple protocols including Hadoop API, S3 API, iSCSI block, and NFS for today’s diverse workloads
  • Set that stage for the containerized world, with a storage interface called Container Storage Interface (CSI) for Kubernetes and Apache YARN

Blog: Open Hybrid Architecture - O3 The New Rocket Ship
Compute Environment - Run containerized big data workloads

In the big data world, customers have business analysts running interactive sub-second queries for reporting, data engineers running a batch ETL job, or data scientists running a very GPU intensive deep learning training model. They all have various needs which include a scheduler that can handle thousands of big data jobs that run in a shared multi-tenant cluster.

Apache YARN with advanced capabilities can handle these diverse workloads from real-time interactive queries to batch workloads at scale in an elastic manner. This gives YARN the opportunity to become a powerful job scheduler for the hybrid environment. It complements Kubernetes, the container orchestrator, which does not have a capacity scheduler like YARN.

Blog: Open Hybrid Architecture - Running Stateful Containers on YARN
Blog: Containerized Apache Spark on YARN in Apache Hadoop 3.1
Blog: Trying Out Containerized Applications on Apache Hadoop YARN 3.1
Article: The Rise of Kubernetes Epitomizes the Transition from Big Data to Flexile Data
Workloads - Enable agility with a consistent architecture and user experience

Our data environment exists so that various processing workloads can get the insights our customers can drive real business transformation at their organizations. Many of the workloads such as EDW (Enterprise Data Warehouse), Data Science and Engineering Platforms have different release cadences.

The Hortonworks architecture enables customers to easily change the software revision of the component independent of the underlying infrastructure, avoiding a monolithic giant upgrade. It also allows customers to accommodate and adjust to different release cadences. It enables Hortonworks to provide an on-demand workload creation with a self-service persona focused user interface for thousands of tenants in the big data environment. The same architecture can be applied to on-premises and multi-cloud environments.

Blog: : Open Hybrid Architecture - Real World Use Case
Why CNCF - Journey to cloud native big data architecture

Hortonworks has collaborated with many customers in their container journey for many years, leading to one of our major launches of recent times (Hortonworks Data Platform 3.x). Kubernetes has started as a container orchestrator for stateless applications such as web applications and is now on its path to support data-intensive applications. Running the Big Data stack on Kubernetes introduces many new challenges and opportunities.

Kubernetes has been a community effort based on collaboration across a wide variety of technologies and that aligns with Hortonworks’ commitment to the open source community. We want to participate in the Cloud Native Computing Foundation (CNCF) and jointly work towards a common Cloud Native Big Data Architecture.

Infographic: Cloud Native Journey for Big Data
Shared Security and Governance

To deploy containerized workloads with cloud like agility, customers want a shared and persistent security and governance layer to enforce access control and data governance centrally. As the data is distributed across the Apache Hadoop file-system and cloud object storage, it becomes necessary to have common security and governance control.

With Hortonworks, customers can Integrate existing security and governance capabilities via Apache Ranger, Apache Atlas and Apache Knox to provide consistent management, security and governance across multiple data types.

  • Manage your security across your data globally from a single pane of glass
  • Enable attribute and role-based based security policies to give you fine-grained access controls
  • Utilize flexible authentication schemes to avoid credential sprawl
  • Manage meta-data and governance policies centrally

Open Hybrid Architecture Initiative: Consistent Hybrid Architecture