“Data is to information society what fuel was to the industrial economy: the critical resource powering the innovations that people rely on,” write Victor Mayer-Schönberger and Kenneth Cukier, in Big Data. Today, big data fuels and engenders innovation of new products and services, according to Forrester.
Just as countries’ fuel repositories need protection and security because they can come under attack, so do companies’ big data repositories. “Companies, markets, and countries are increasingly under attack from cyber-criminals. They need to get much better at protecting [and securing] themselves,” says Martin Giles, of the Economist.
Hence, comprehensive and coordinated security—at the perimeter as well as across all data access points—in the Hadoop Enterprise ecosystem is one of the paramount pillars and a core capability of the Hadoop Enterprise blueprint.
All security administrators want this core capability administered centrally by placing security controls across an entire Hadoop cluster, from the perimeter down to the Hadoop stack. They want layered security controls that exercise the following:
These four mechanisms comprise the basic hygiene of comprehensive security, whether securing an entry point or accessing a data source.
Not having absolute control over these undesirable outcomes or not having proper security controls in place can easily become a huge problem—and can easily deprive security administrators and devops of their sleep.
As @DevOps_Borat sanguinely satirizes, you don’t want some rogue command or capricious code drop your schema, delete your HDFS directory, kill a MapReduce job, submit a malicious client to YARN, or query unauthorized sensitive data:
Far from being sleepless or powerless against security breaches, security administrators can lock down their Hadoop clusters. The good news is that the Apache Hadoop community, through various Apache projects, shared best practices on how they address and manage complex security challenges at the Hadoop Summit.
Although the Hadoop Summit San Jose 2014 has come and gone, the invaluable content is available here. We selected a few sessions below for Hadoop security administrators, curating them under a general Hadoop security theme. Here are a few sessions that speak to security challenges:
|Improvements in Hadoop Security||Video||Slides|
|Hadoop REST API Security with the Apache Knox Gateway||Video||Slides|
|Using Hadoop and Machine Learning to Detect Security Risks and Vulnerabilities, and Predict Breaches in your Enterprise Environment||Video||Slides|
|The Future of Hadoop Security||Video||Slides|
We cherry picked these few tracks that best addressed those topics, but you can always peruse through all the tracks on the schedule’s session description along any time slot, on any day, that piques your curiosity.