This guest blog post is from Srikanth Venkat, director of product management at Dataguise, a Hortonworks security partner.
Plus ça change, plus c’est la même chose
As Jean-Baptiste Alphonse Karr noted “The more things change, the more they stay the same.” Often, that’s not what we hear when looking at Hadoop security: people tend to call out how different Hadoop is, and how different its security solutions need to be. For protecting data in Hadoop, people will highlight that with big changes in Big Data existing enterprise security models don’t hold, and that a whole new paradigm is required. They’ll call out that the data is different, the processing is different, and the access and data sharing are different. As a security startup focused on the Hadoop market, we are as guilty of this at Dataguise as the next guys.
At the recent Big Data Security and Governance Meetup, Balaji Ganesan of Hortonworks (formerly co-founder XA Secure) gave a presentation entitled “Apache Ranger: Current State and Future Directions”. The talk focused on how Apache Ranger provides differentiated capabilities around authorization and auditing from an enterprise wide compliance and governance perspective. I’ll cover some thoughts on those capabilities below. But in hearing Balaji present on authorization models, I realized how many parallels one could draw from other security contexts to what I’ll call necessary “security non-inventions” in the Hadoop context. We’ve been so busy highlighting the new, that we’ve forgotten the well known and honed. Yes, there are some new architectural requirements needed around authentication, authorization, and auditing in big data, but the foundational principles stay the same. Some of those fundamentals include:
So, back to authorization models, why not re-use existing authorization frameworks in Hadoop? These standards, such as XACML, (the de facto standard for centralized entitlement management and authorization policy evaluation and enforcement) do not apply well to the Hadoop context due to an overly complex policy framework that is hard to setup, and an inability to serve distributed deployments. Apache Ranger overcomes these limitations by offering a centralized security framework to manage fine-grained access control over Hadoop data access easily via policies for access to files, folders, databases, tables, or columns at individual users or group level that can be synchronized with external enterprise directories such as AD or LDAP.
As enterprises gradually move to adopting Hadoop as the Enterprise Data Platform, comprehensive Hadoop security that is pluggable and easy to use not only becomes a necessity for survival, but also for satisfying the increasingly complex compliance and regulatory context as higher risk data get stored in Hadoop. This need transcends basic authorization via file permissions in HDFS, resource level access control (via ACL) for MapReduce and coarser grained access control at a service level. It is precisely on this market need for pluggable security that Apache Ranger, a project that is the community offshoot of Hortonwork’s XA Secure acquisition, focuses. Apache Ranger delivers a ‘single pane of glass’ for centralized, consistent administration of security policy across Hadoop ecosystem tools and all Hadoop workloads including batch, interactive SQL and real-time. Ranger’s architectural components include:
In the Hadoop universe, Apache Ranger now enables security personnel to perform richer audit tracking and deeper policy analytics streamlining governance across an enterprise.
Balaji also outlined several exciting future capabilities that are being planned for Apache Ranger including:
As enterprises increasingly adopt Hadoop for complex business analytics and decision-making, sensitive customer and corporate data burgeons in the Hadoop ecosystem. Balaji remarked that, in addition to Hadoop toolsets for authentication and auditing such as Apache Ranger, enterprises increasingly need robust data-centric protection (masking, encryption, tokenization) in order to effectively reduce data privacy risks in a practical, repeatable, and efficient manner and adequately address insider threat issues for sensitive data stored in Hadoop. In this arena, partner solutions that offer data centric protection for Hadoop data such as Dataguise DgSecure for Hadoop complement the enterprise ready Hadoop distributions (such as those from Hortonworks). Dataguise’s DgSecure for Hadoop solution helps enterprises fully address their nuanced compliance and risk management needs by providing complete protection across your Hadoop data lifecycle whether during ingest, in storage, or in usage as shown below.
Ultimately, I think that Hadoop security has its challenges, and some look exactly like existing security challenges, and some are brand-spanking new and introduced by the nature of the architecture, features, and functions of Big Data. For managing authorization, continued innovation through efforts such as Apache Ranger are tackling many of these issues, but ultimately, it’s comforting to know that some things never change.