HDP Advanced Security Webinar

Comprehensive security for enterprise Hadoop

This week we hosted a webinar entitled HDP Advanced Security: Comprehensive Security for Enterprise Hadoop. Over 135 people attended, prompting an informative discourse and a series of questions.

The speakers outlined the HDP Advanced Security features and benefits in Hortonworks Data Platform and gave a demo. Thanks to our presenters Justin Sears (Hortonworks’ Product Marketing Manager), Balaji Ganesan (Hortonworks’ Senior Director, Enterprise Security Strategy), and Don Bosco Durai (Hortonworks’ Enterprise Security Architect).

The presentation covered:

  • Advanced Security features
  • Fine grained access control in HDFS, HBase and Hive
  • Detailed access and policy auditing
  • Centralized administration and management of security across the Hadoop platform

If you missed the webinar, here is the complete recording.

And here is the presentation deck.

Webinar Q & A

Question Answer
How does XA Secure work with Hadoop Simple Authentication? The HDP Advanced Security (XA Secure) solution works with any authentication mechanism supported by Hadoop. HDP Advanced Security provides support for Kerberos-based authentication. Kerberos can be connected to corporate LDAP environments to centrally provision user information. HDP also provides perimeter authentication through Apache Knox for REST APIs and Web services.
Does the auditing show the actual query that was run from Hive? No, we show the action performed by the user and the resource accessed.
Are features of XA Secure available through the HDP 2.1 distribution? Yes, HDP Advanced Security (XA Secure) is available to download as an add-on to HDP 2.1.
Is Knox still used for Perimeter security? Apache Knox Gateway will continue to be the perimeter security solution. HDP Advanced Security adds additional security within the cluster while Knox assures security at the perimeter. We plan to integrate Knox with HDP Advanced Security bits in the near future so that we can manage the service level security measures through the UI.
Is policy data stored into a database or in HDFS itself? Policies are currently stored in a database.
What are the upcoming supported components and what is the timeline for delivering their integration? We want to incorporate all aspects of data governance in Falcon and data streaming in Storm into HDP Advanced Security. We plan to enable centralized authorization of users accessing services in Falcon and topologies in Storm.Secondly, through auditing, we want to ensure that we trace a trail of how data came into Hadoop, who processed it and whether they changed it.
Can you see all users with administrative rights across any system? Currently, we do not support that feature.
How much integration does HDP Advanced Security offer when it comes to windows ADS users and groups? Out of the box, we support integration with both Active Directory and LDAP by synchronizing users and groups and leveraging those in HDP Advanced Security for managing policies. In addition, authentication can be deferred to LDAP or AD, where credentials are stored.
Can a Resource Path be externalized? We are working toward providing APIs to manage policies, outside of our UI. Part of the policy definition will be how a resource is defined, and we will publish guidelines as part of the API.
Is the policy pushed on to HDFS and enforced there? Policies are stored in database, but a copy is pushed to HDFS, where it’s enforced.
Will Hive or HBase table access be restricted by row and column? Yes, table access can be restricted at row and column levels.
Does Policy Manager overrides everything on HDFS? Yes. But there is a configurable option available to use HDFS ACL as the secondary authorization.
Why do we still need Knox when we have HDP Advanced Security? Both serve different purposes. Knox is used in perimeter security around the cluster, whereas HDP Advanced Security provides fine-grained authorization and audit within the cluster.
Can you access the underlying table in HDFS? Assuming that this question refers to Hive, if impersonation is turned off, then HDFS level checks are enforced.
Do you have plans to easily install and configure Kerberos for Hadoop? As part of our roadmap and as part of improving security in enterprise Hadoop, we plan to make it easier to manage, install, and configure Kerberos. Today, you can manage some aspects of Kerberos through Apache Ambari, but we plan to extend that further so that Kerberos configuration becomes easier.
Do User and Group Permissions use underlying extended HDFS ACLs? Both the policies are managed separately. You can configure to fall back to HDFS ACLs if need. This ensures backward compatibility for existing policies.
How do you ensure consistency among policies set by UI and those executed via native CLI? We continue to work toward achieving a single version of truth, regardless of whether you manage polices via UI or CLI.
When you change Hive policy, does it alter Metastore or base file permissions? Currently the changes are only stored in central the XA database. The XA Agents maintain a replica of this policy locally. These permissions, however, are not reflected in Metastore.
How do we encrypt the data at rest? This is a broad topic as well as an interesting one. Enterprises approach data encryption or data masking in multiple ways. The consensus is that security is layered, so you need the basic layers—authentication, authorization, firewall or perimeter protection— in place. As an administrator, you need to evaluate at what stage in the data journey you need data protection, encryption, or masking.Furthermore, you need to decide what aspects of data you wish to encrypt–the entire data set or specific pieces of data?What we support as part of Hadoop is encryption of data in flight. The encryption out-of-the-box comes via Linux level support in LUKS.However, if you want advanced data encryption, we continue to work with partners that provide strong data encryption at all levels, across the enterprise, including data at rest.
When you set a policy on a Hive table, how does that reflect or translate to the underlying HDFS file? It depends how you’ve set up Hive. Hive can be configured to access underlying HDFS files as a “Hive user”.Hive can also be configured to “do-as” where the underlying files would be accessed as the user. In the second scenario, at the HDFS level, we allow HDFS file permissions as a secondary user-level protection.
Will XA Secure UI be integrated with Apache Ambari? Our vision is simplified and centralized administration—so that means doing it eventually via Ambari or Ambari Views.

What’s Next?

  • Visit HDP Advanced Security Apache page to download the bits
  • And if you have any further questions pertaining to HDP Security—documentation, code examples, tutorials—please post them on the Community forums under Security.

Categorized by :
Administrator Security

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.
Integrate with existing systems
Hortonworks maintains and works with an extensive partner ecosystem from broad enterprise platform vendors to specialized solutions and systems integrators.