Apache™ Accumulo is a high performance data storage and retrieval system with cell-level access control. It is a scalable implementation of Google’s Big Table design that works on top of Apache Hadoop® and Apache ZooKeeper.
Cell-level access control is important for organizations with complex policies governing who is allowed to see data. It enables the intermingling of different data sets with different access control policies and proper handling of individual data sets that have some sensitive portions.
Without Accumulo, those policies are difficult to enforce systematically. Accumulo encodes those rules for each individual data cell and allows fine-grained access control.
What Accumulo Does
Accumulo contains a variety of features for general administration, table design, data integrity and availability, performance, testing, client APIs, extensible behaviors and data management. Some of those features are listed here:
|Table design and configuration||
|Integrity and availability||
|Data management||Internal capabilities
How Accumulo Works
Accumulo stores sorted key-value pairs. Sorting data by key allows rapid lookups of individual keys or scans over a range of keys. Since data is retrieved by key, the keys should contain the information that will be used to do the lookup.
- If retrieving data by a unique identifier, the identifier should be in the key.
- If retrieving data by its intrinsic features, such as values or words, the keys should contain those features.
The values may contain anything since they are not used for retrieval.
The original Big Table design has a row and column paradigm. Accumulo extends the column with an additional “visibility” label that provides the fine-grained access control.
Accumulo is written in Java, but a thrift proxy allows users to interact with Accumulo using C++, Python or Ruby. A pluggable user-authentication system allows LDAP connections to Accumulo. An HDFS class loader loads JARs from Hadoop Distributed File System (HDFS) to multiple servers. Accumulo also has connectors with other Apache projects such as Hive and Pig.
Hortonworks provides the fastest path to innovation by working with the open source community by identifying and developing enterprise requirements for Hadoop.
Business Value of Hadoop
Sources of Big Data are turning the conversation from “data analytics” to “big data analytics” because they hold significant business value.