Securing Hadoop with Knox Gateway
Back in the day, in order to secure a Hadoop cluster all you needed was a firewall that restricted network access to only authorized users. This eventually evolved into a more robust security layer in Hadoop… a layer that could augment firewall access with strong authentication. Enter Kerberos. Around 2008, Owen O’Malley and a team of committers led this first foray into security and today, Kerberos is still the primary way to secure a Hadoop cluster.
Fast-forward to today… Widespread adoption of Hadoop is upon us. The enterprise has placed requirements on the platform to not only provide perimeter security, but to also integrate with all types of authentication mechanisms. Oh yeah, and all the while, be easy to manage and to integrate with the rest of the secured corporate infrastructure. Kerberos can still be a great provider of the core security technology but with all the touch-points that a user will have with Hadoop, something more is needed.
The time has come for Knox.
The only path to security in Hadoop is the community
The Knox Gateway aims to provide perimeter security that will integrate easily into existing security infrastructure. Delivering this key component of the Apache Hadoop ecosystem is a critical community project. Security is not an afterthought. It needs to be woven into the very fabric of Hadoop in order to be effective. Being a part of the community will allow Knox to accomplish just that.
Already the community has rallied around the project and the vote has been positive thus far. Tomorrow we should see community approval of a new incubation project in the Apache Software Foundation for Knox, a security layer for the Hadoop ecosystem. The initial mentor list contains resources from Hortonworks, Microsoft and NASA among others.
What comprises the Knox Gateway?
The Knox Gateway (“Gateway” or “Knox”) is a system that provides a single point of authentication and access for Apache Hadoop services in a cluster. The goal is to simplify Hadoop security for both users (i.e. who access the cluster data and execute jobs) and operators (i.e. who control access and manage the cluster). The Gateway runs as a server (or cluster of servers) that serve one or more Hadoop clusters. It has few key functions:
- Provide perimeter security to make Hadoop security setup easier
- Support authentication and token verification security scenarios
- Deliver users a single cluster end-point that aggregates capabilities for data and jobs
- Enable integration with enterprise and cloud identity management environments
- Manage security across multiple clusters and multiple versions of Hadoop
Knox will be able to provide a security layer for multiple clusters and multiple versions of Hadoop simultaneously and will deliver a simple intuitive management interface. Playing nice with others is always a security imperative, so Knox will integrate with the existing frameworks for Active Directory /LDAP and it will allow for extensions for custom authentication mechanisms.
The short term plan for the Knox team is to deliver a solid, working release in late March so that early adopters can begin to evaluate and provide valuable feedback. This critical step will ensure that the gateway fits nicely into customers’ infrastructure and makes Hadoop easier to use… and more secure.