Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics, offering information and knowledge of the Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
January 08, 2015
prev slideNext slide

Hadoop Security: Is it a Different Paradigm?

This guest blog post is from Srikanth Venkat, director of product management at Dataguise, a Hortonworks security partner.

AlphonseKarr1Plus ça change, plus c’est la même chose
As Jean-Baptiste Alphonse Karr noted “The more things change, the more they stay the same.” Often, that’s not what we hear when looking at Hadoop security: people tend to call out how different Hadoop is, and how different its security solutions need to be. For protecting data in Hadoop, people will highlight that with big changes in Big Data existing enterprise security models don’t hold, and that a whole new paradigm is required. They’ll call out that the data is different, the processing is different, and the access and data sharing are different. As a security startup focused on the Hadoop market, we are as guilty of this at Dataguise as the next guys.

At the recent Big Data Security and Governance Meetup, Balaji Ganesan of Hortonworks (formerly co-founder XA Secure) gave a presentation entitled “Apache Ranger: Current State and Future Directions”. The talk focused on how Apache Ranger provides differentiated capabilities around authorization and auditing from an enterprise wide compliance and governance perspective. I’ll cover some thoughts on those capabilities below. But in hearing Balaji present on authorization models, I realized how many parallels one could draw from other security contexts to what I’ll call necessary “security non-inventions” in the Hadoop context. We’ve been so busy highlighting the new, that we’ve forgotten the well known and honed. Yes, there are some new architectural requirements needed around authentication, authorization, and auditing in big data, but the foundational principles stay the same. Some of those fundamentals include:

  • Simplicity matters – I liked some of the decisions made by the Ranger team to balance scalability (inheritance and hierarchical support for users/groups) with simplicity (positive permission model with a borrowed “only-one” applicable model from XACML).
  • Performance matters – By making the embedded policy enforcement points (PEPs) local with a plugin style framework, Ranger nicely solves any latency or performance issues with local security processing for authorization.
  • Coverage matters – Perhaps most interesting to me, as an enterprise security enthusiast, was the breadth and speed with which Ranger – as a 100% open source project – could cover authorization plugins quickly across the fast evolving Hadoop ecosystem (YARN, Solr, Kafka etc. are all expected shortly).

So, back to authorization models, why not re-use existing authorization frameworks in Hadoop? These standards, such as XACML, (the de facto standard for centralized entitlement management and authorization policy evaluation and enforcement) do not apply well to the Hadoop context due to an overly complex policy framework that is hard to setup, and an inability to serve distributed deployments. Apache Ranger overcomes these limitations by offering a centralized security framework to manage fine-grained access control over Hadoop data access easily via policies for access to files, folders, databases, tables, or columns at individual users or group level that can be synchronized with external enterprise directories such as AD or LDAP.

As enterprises gradually move to adopting Hadoop as the Enterprise Data Platform, comprehensive Hadoop security that is pluggable and easy to use not only becomes a necessity for survival, but also for satisfying the increasingly complex compliance and regulatory context as higher risk data get stored in Hadoop. This need transcends basic authorization via file permissions in HDFS, resource level access control (via ACL) for MapReduce and coarser grained access control at a service level. It is precisely on this market need for pluggable security that Apache Ranger, a project that is the community offshoot of Hortonwork’s XA Secure acquisition, focuses. Apache Ranger delivers a ‘single pane of glass’ for centralized, consistent administration of security policy across Hadoop ecosystem tools and all Hadoop workloads including batch, interactive SQL and real-time. Ranger’s architectural components include:

  • Ranger portal where users can create and update policies, which are then stored in a policy database and an audit server that sends audit data collected across Hadoop components (Hive, HDFS, Hbase etc.) via Ranger plugins.
  • Ranger plugins that enforce policies for the components; and
  • A user synchronization utility for users and groups from Unix, LDAP or Active Directory.

In the Hadoop universe, Apache Ranger now enables security personnel to perform richer audit tracking and deeper policy analytics streamlining governance across an enterprise.

ranger

Balaji also outlined several exciting future capabilities that are being planned for Apache Ranger including:

  • Integration with Apache Falcon to leverage data lineage within the cluster
  • Deeper integration with Apache Ambari
  • New and improved permission schemes for cluster components
  • Interactive audit querying through Solr
  • Global tag based policies
  • Tighter support for administration of data protection policies with partner solutions, such as Dataguise.

As enterprises increasingly adopt Hadoop for complex business analytics and decision-making, sensitive customer and corporate data burgeons in the Hadoop ecosystem. Balaji remarked that, in addition to Hadoop toolsets for authentication and auditing such as Apache Ranger, enterprises increasingly need robust data-centric protection (masking, encryption, tokenization) in order to effectively reduce data privacy risks in a practical, repeatable, and efficient manner and adequately address insider threat issues for sensitive data stored in Hadoop. In this arena, partner solutions that offer data centric protection for Hadoop data such as Dataguise DgSecure for Hadoop complement the enterprise ready Hadoop distributions (such as those from Hortonworks). Dataguise’s DgSecure for Hadoop solution helps enterprises fully address their nuanced compliance and risk management needs by providing complete protection across your Hadoop data lifecycle whether during ingest, in storage, or in usage as shown below.

DgSecure Architecture-Diagram-V8

Ultimately, I think that Hadoop security has its challenges, and some look exactly like existing security challenges, and some are brand-spanking new and introduced by the nature of the architecture, features, and functions of Big Data. For managing authorization, continued innovation through efforts such as Apache Ranger are tackling many of these issues, but ultimately, it’s comforting to know that some things never change.

Learn More

Tags:

Comments

  • Leave a Reply

    Your email address will not be published. Required fields are marked *