Apache Knox Gateway

A single point of secure access for Hadoop clusters
The Knox Gateway (“Knox”) is a system that provides a single point of authentication and access for Apache™ Hadoop® services in a cluster. The goal of the project is to simplify Hadoop security for users who access the cluster data and execute jobs, and for operators who control access and manage the cluster. Knox runs as a server (or cluster of servers) that serve one or more Hadoop clusters.

What Knox Gateway Does

Knox Gateways provides security for multiple Hadoop clusters, with these advantages:

  • Provide perimeter security to make Hadoop security setup easier
  • Support authentication and token verification security scenarios
  • Deliver users a single cluster end-point that aggregates capabilities for data and jobs
  • Enable integration with enterprise and cloud identity management environments
  • Manage security across multiple clusters and multiple versions of Hadoop

How Knox Gateway Works

Knox aims to provide perimeter security that will integrate easily into existing security infrastructure.  Delivering security to the Hadoop ecosystem is a critical community project.  Knox needs to be woven into the very fabric of Hadoop in order to be effective, and being a part of the community will allow Knox to accomplish just that.

Currently, a Hadoop cluster is presented to consumers as a loose collection of independent services. This makes it difficult for users to interact with Hadoop since each service maintains it’s own method of access and security. Configuration and administration of a secure Hadoop cluster is complex and so many Hadoop administrators are forced with the choice of slowing their Hadoop rollout or running Hadoop without security.

The goal of the project is to provide coverage for all existing Hadoop ecosystem projects. In addition, the project will be extensible to allow for future proprietary Hadoop components without requiring changes to the Knox source code. Knox is expected to run in a DMZ environment where it will provide controlled access to multiple Hadoop services. In this way Hadoop clusters can be protected by a firewall with controlled access. The authentication components of the gateway will be modular and extensible, to be easily integrated with existing security infrastructure.

Try these Tutorials

Hortonworks Committers
6

Try Knox Gateway with Sandbox

Hortonworks Sandbox is a self-contained virtual machine with HDP running alongside a set of hands-on, step-by-step Hadoop tutorials.

Get Sandbox

Resources

More posts on:
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.