Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
August 02, 2018
prev slideNext slide

Simplifying the Fortification of your Data Lake with Apache Knox

This blog is a first in a series of security-related blogs that we plan to publish in the near future.

It’s a myth that usability and security are mutually exclusive. In this blog, we’ll try to dispel it in the context of Apache Knox. For those who are not familiar with Apache Knox, it is:

  • an extensible reverse proxy framework
  • that can be deployed in the cloud and/or on-prem
  • for securely exposing REST APIs, web UIs, and WebSockets-based services

Apache Knox provides the following functionality out-of-the-box:

  • Proxying of HTTP services – REST, UIs, Websockets
  • Authentication services – pluggable authentication and federation providers and token, SSO services
  • Client services – KnoxShell for consuming cluster services through Knox
  • And many other features…

As you may know, configuration via XML/JSON files and CLI is prone to human errors, not to mention a joyless experience for a security admin. The pain is compounded when you need to repeat the process for multiple clusters in the data lake. To alleviate this pain, we introduced a feature called Service Discovery and Topology Generation Framework in Apache Knox 0.14.0 (KNOX-1006) that will be available in HDP 3.0. Additionally, we have front ended it with the Knox Admin UI to provide a full UI-driven experience for managing Knox topologies. We have also added an Apache Ambari script that users can run using CLI to automate the configuration for Knox SSO.

Here are some salient features of the solution:

  • Supports remote creation and management of Knox topologies via a web UI, rather than editing XML files directly on the Knox hosts
  • Supports service endpoint discovery for Hadoop services in an Ambari-managed cluster, to simplify topology definition and management
  • Auto-configures Knox by monitoring ZooKeeper for changes in topology-related configuration and credentials, thereby making Knox stateless
  • Supports Knox HA when Knox instances are working with ZooKeeper, such that topology-related changes and credentials are automatically propagated across all participating Knox instances.
  • Adapts to cluster changes that affect deployed topologies by regenerating and redeploying those topologies with updated cluster details
  • Supports NameNode Federation by supporting configuration of Knox for Multiple Namenodes/Namespaces
  • Eliminates the need to separately configure SSO for individual supported components
  • Reduces setup time for Knox (based on internal testing)

For your understanding, we have outlined a sample workflow for WebHDFS proxy and LDAP AuthN:

1.Log into the Knox Admin 

2. Update Shiro Provider under Provider Configuration/default provider tab to reflect your distinguished name and the URL of the proxy service, and save the provider configuration.

3. Create a service descriptor for WebHDFS and update the discovery information with the URL of the Ambari server

4. Save the Descriptor to trigger the generation and deployment of the Knox Topology. That’s it!

5. Run a few HDFS commands using the Knox proxy URL to validate the setup

Below is a screenshot that shows how you can configure Knox SSO using an Amabri script from the CLI without having to configure each service separately:

If you are interested in the magic behind the scenes, please refer to the below technical documentation:

https://community.hortonworks.com/articles/154912/apache-knox-dynamic-service-endpoint-discovery.html

https://cwiki.apache.org/confluence/display/KNOX/KIP-8+Service+Discovery+and+Topology+Generation

https://knox.apache.org/books/knox-0-14-0/user-guide.html#Externalized+Provider+Configurations

https://knox.apache.org/books/knox-0-14-0/user-guide.html#Deployment+Directories

https://knox.apache.org/books/knox-0-14-0/user-guide.html#Cluster+Configuration+Monitoring

https://knox.apache.org/books/knox-0-14-0/user-guide.html#Remote+Configuration+Monitor

https://knox.apache.org/books/knox-0-14-0/user-guide.html#Remote+Configuration+Registry+Clients

I hope this blog is helpful in explaining why security and usability are not mutually exclusive. Stay tuned for further updates on Apache Knox.

Tags:

Leave a Reply

Your email address will not be published. Required fields are marked *