Apache Ranger provides centralized security for the Enterprise Hadoop ecosystem, including fine-grained access control and centralized audit mechanism, all essential for Enterprise Hadoop. This blog covers various details of Apache Ranger’s audit framework options available with Apache Ranger Release 0.4.0 in HDP 2.2 and how they can be configured.
The audit framework can be configured to send access audit logs generated by Apache Ranger plug-ins to one or more of the following destinations:
Ranger Audit framework supports saving audit logs to RDBMS. Currently, MySQL and Oracle are the supported RDBMS, with other DBs such as Postgres in the roadmap. Interactive audit reporting in Ranger Administration portal reads the audit logs from RDBMS.
Database schema for audit logging is generated during the installation of Ranger Admin. Before running setup.sh to setup Ranger Admin, please specify the audit database details in install.properties file, as shown in the example below. (Please refer to Ranger Admin installation documentation for details of install.properties and setup.sh.)
During setup of each Ranger plug-in (HDFS/Hive/HBase/Knox/Storm), i.e. before running enable-*-plugin.sh, please specify the audit database details in install.properties , as shown in the example below. (Please refer to Ranger plug-in installation documentation for details of install.properties and enable-*-plugin.sh.) Please ensure to provide the same database details used during Ranger Admin setup.
Audit logging to RDBMS can be configured to be synchronous or asynchronous. In synchronous mode, the calls to audit will block the thread until it is committed to the database. In asynchronous mode, the calls to audit will return quickly after adding the audit log to an in-memory queue. Another thread in the audit framework will read from this queue and save to RDBMS. In asynchronous mode, a single database commit can include number of audit logs (batch commit); this can result in significant performance improvements. If the in-memory queue is full, the audit log will be dropped; periodic log messages will be written to the component log file with the count of dropped audit logs.
The default mode for audit logging to RDBMS is asynchronous. To alter the default logging mode and other configurations, like the size of the in-memory queue, update the xasecure-audit.xml in the CLASSPATH, which is typically in the component’s configuration directory, for example /etc/hadoop/conf/xasecure-audit.xml. For any configuration changes to take effect, restart the component. A list of available configurations is provided in Configuration section below.
To handle higher rate and volume of audit logs in your environment, we suggest you plan appropriate database sizing, partitioning, and automated way of purging logs.
Ranger Audit framework can be configured to store the audit logs to HDFS, in JSON format (example below). Audit logs in HDFS can later be processed by other applications, like Apache Hive, to query and report. Please note that audit reporting functionality in Ranger Administration Portal currently uses only the audit logs stored in RDBMS.
A sample Apache HBase access audit log in JSON format:
To minimize the performance impact, the calls to create audit log write the audit log to a staging file on the host where the component runs. The local staging file is rolled-over periodically, every 10 minutes by default. After a rollover, another thread in the audit framework writes/appends the staged file contents to a HDFS file. Depending upon the rollover interval configuration of the HDFS and local staging files, multiple local staged files can be written to the same HDFS file.
Saving of audit logs to local staging file can either be synchronous or asynchronous. In synchronous mode, the calls to audit will block the thread until the log is written to the staging file. By contrast, in asynchronous mode, the calls to audit will return quickly after adding the audit log to an in-memory queue. A separate thread in the audit framework will read from this queue and write to local staging file. If the in-memory queue is full when an audit call is made, the audit log will not be recorded. To keep record of unrecorded audit logs, a count of unrecorded audit logs will be periodically written to the component log.
As with logging to RDMS, the default mode for audit logging to HDFS is asynchronous. The logging mode and other configurations, like the size of the in-memory queue, rollover period, etc., can be changed by updating xasecure-audit.xml in the CLASSPATH (typically in the component’s configuration directory, for example /etc/hadoop/conf/xasecure-audit.xml). For changes to take effect, restart of the component is required. A list of available configurations is provided in Configuration section below.
To help organize the audit logs in the file system, Ranger audit framework supports various tags in the file/directory names. At the time of file creation, the audit framework replaces these tags with appropriate values. Here are the details of the tags supported on file and directory names:
Name of the current host in which the audit framework is executing.
Current time formatted using the given specification. For more details on the supported format specification, please refer to Java SimpleDateFormat documentation.
Unique identifier of the JVM instance in which the audit framework is executing – generated using Java VMID class.
Value of the given system property name in the JVM where audit framework is executing.
Value of the given environment variable in the JVM where audit framework is executing.
Type of the application the audit framework runs in:
hdfs, hiveServer2, hbaseMaster, hbaseRegional, knox, storm
The Ranger Audit framework supports sending audit logs to log4j appender(s). Using this mechanism, you can send Ranger audit logs to destinations that have log4j appenders. To receive audit logs in JSON format, component’s log4j configuration should be updated to specify the appender(s) in the following property:
Ranger audit framework reads its configuration from xasecure-audit.xml in the CLASSPATH, typically in the conf directory of the Hadoop component in which the Ranger plug-in runs. This file is populated with values provided by the user during Ranger plug-in installation. The configurations supported in xasecure-audit.xml along with the details of the values for each are listed in the following table; this file has additional configuration than the ones available during installation.Please note that for changes to this file to become effective, the component needs to be restarted.
|Configuration Name||Default Value||Notes/strong>|
|xasecure.audit.is.enabled||true||Setting to enable/disable audit logging in the Ranger plug-in.
true – enable audit log
false – disable audit log
|xasecure.audit.db.is.enabled||false||true – enable audit to RDBMS
false – disable audit to RDBMS
|xasecure.audit.db.is.async||false||true – send audit logs to DB asynchronously
false – send audit logs to DB synchronously
|xasecure.audit.db.async.max.queue.size||10240||Maximum number of audit logs to keep in queue. Attempts to create audit log when the queue is at maximum will result dropping of the audit log.|
|xasecure.audit.db.async.max.flush.interval.ms||5000||Maximum interval between commits to database.|
|xasecure.audit.db.config.retry.min.interval.ms||15000||Interval between attempts to connect to the database, after a failure.|
|xasecure.audit.jpa.javax.persistence.jdbc.driver||None||JDBC driver to connect to the DB. Example:
|xasecure.audit.jpa.javax.persistence.jdbc.url||None||JDBC URL to connect to the DB.|
|xasecure.audit.jpa.javax.persistence.jdbc.password||None||Password to connect to the DB.|
|xasecure.audit.hdfs.is.enabled||false||true – enable audit to HDFS
false – disable audit to HDFS
|xasecure.audit.hdfs.is.async||false||true – send audit logs asynchronously
false – send audit logs synchronously
|xasecure.audit.hdfs.async.max.queue.size||10240||Maximum number of audit logs to keep in queue. Attempts to create audit log when the queue is at maximum will result dropping of the audit log.|
|xasecure.audit.hdfs.config.destination.directroy||None||Absolute path to the HDFS directory in which audit logs should be stored. See the note below on the tags supported on file/directory names.|
|xasecure.audit.hdfs.config.destination.file||None||Name of the HDFS file to which audit logs should be written. See the note below on the tags supported on file/directory names.|
|Interval between calls to hflush on destination HDFS file.|
|Interval between rollover of destination HDFS file.|
|Interval between calls to flush audit logs written to staging file.|
|Interval between rollover of staging file.|
|None||Absolute path to the local directory to store audit log files after sending to HDFS. See the note below on the tags supported on file/directory names.|
|xasecure.audit.hdfs.config.local.archive.max.file.count||None||Maximum number of files to store in archive directory.|
|xasecure.audit.log4j.is.enabled||false||true – enable audit to log4j
false – disable audit to log4j
|xasecure.audit.log4j.is.async||false||true – send audit logs asynchronously
false – send audit logs synchronously
|xasecure.audit.log4j.async.max.queue.size||10240||Maximum number of audit logs to keep in queue. Attempts to create audit log when the queue is at maximum will result dropping of the audit log.|