Snapshots for HDFS
This blog covers our on-going work on Snapshots in Apache Hadoop HDFS. In this blog, I will cover the motivations for the work, a high level design and some of the design choices we made. Having seen snapshots in use with various filesystems, I believe that adding snapshots to Apache Hadoop will be hugely valuable to the Hadoop community. With luck this work will be available to Hadoop users in late 2012 or 2013.
A snapshot is a point-in-time image of the entire filesystem or a subtree of a filesystem. Some of the scenarios where snapshots are very useful:
- Protection against user errors: Admin sets up a process to take read-only (RO) snapshots periodically in a rolling manner so that there are always x number of RO snapshots on HDFS. If a user accidentally deletes a file, the file can be restored from the latest RO snapshot that contains the file.
- Backup: Admin wants backup the entire file system, a subtree in the file system or just a file. Depending on the requirements, admin takes a read-only (henceforth referred to as RO) snapshot and uses this snapshot as the starting point of a full backup. Incremental backups are then taken by doing a diff between two snapshots.
- Experimental/Test setups: A user wants to test an application against the main dataset. Normally, without doing a full copy of the dataset, this is a very risky proposition because the test setup can corrupt/overwrite production data. Admin creates a read-write (henceforth referred to as RW) snapshot of the production dataset and assigns the RW snapshot to the user to be used for experiment. Changes done to the RW snapshot will not be reflected on the production dataset.
- Disaster Recovery: RO Snapshots can be used to create a consistent point in time image for replication and this can be copied over to remote site for Disaster Recovery.
High Level Requirements
- Read-only (RO) snapshots: These are immutable copies of underlying elements of the file system.
- Read-write (RW) snapshots: RW snaps can be modified by a user.
- Support for taking snapshots of the entire namespace, or a subtree.
- Support for a reasonable number of snapshots in a single namenode.
- Snapshots should be easy to browse using standard commands and tools, and copying of data from a snapshot should work with standard Hadoop commands and API.
High Level approaches
We considered two options for snapshots.
Option #1: Both datanodes and namenode are aware of the snapshots and save state internally about the snapshots. Datanode is aware of the fact that some of the blocks are for the snapshot files.
Option #2: Only namenode is aware of the snapshot. Datanode is not aware of the fact that some of the blocks are owned by snapshots of the original file.
Option #2 is selected to keep the design simple. Additionally, taking snapshots is very fast with option #2. Datanode does not know anything about snapshots and is not aware of block ownership issues between root file system and snapshots. Keeping datanodes free from snapshot information simplifies the design immensely by eliminating the need for distributed co-ordination from the design of the snapshots by restricting the changes to namenode only.
Creating and Deleting Snapshots
A key requirement is to ensure that it is very easy to create and delete snapshots. Snapshot creation and deletion is an admin-only capability. To create a snapshot, one specifies a snapshot name, a path to the root of the subtree whose snapshot is to be taken, and whether or not the snapshot is read-only or a read-write. Deleting snapshot requires just a snapshot name. A command to list all the snaps in the filesystem will be provided.
Accessing Directories and Files in a Snapshot
Snapshots can be referenced with regular HDFS path names with a reserved string .snapshot_<name>:
This has the benefit that snapshots can be referenced with all existing Hadoop commands and APIs that take a pathname by adding a reserved snapshot string to the pathname.
Examples: Consider a directory structure of /a/b/c/foo.txt. Admin has created a snapshot hdfs1 at /a/b. To access data related to snapshot hdfs1, some examples of the commands would be:
hadoop dfs -ls /a/b/.snapshot_hdfs1/c/foo.txt
To copy file from /temp/foo/foo1.txt in snapshot branch to /fooBar would be,
hadoop dfs -cp /a/b/.snapshot_hdfs1/c/foo.txt /foobar/.
Some caveats for RO snapshots include the fact that RO snapshot is immutable. So, operations such as creating a new file, deleting a file, creating a new directory, renaming a file or directory will fail when executed on the snapshot branch.
Snapshots are a very useful feature to have in a mature filesystem. This is a work in progress and we have a functional prototype implemented. The first version of this feature will support RO snapshots only. The support for RW snapshots will be added in the subsequent releases. There are several features that can be incorporated into snapshots, such as time to live for snapshots with auto deletion, schedule based creation of snapshots, marking specific directories as snapshot-worthy, quota based restriction on space used by RW snapshots and delegation of authority for creating/deleting snapshots at specific locations to users etc.
To track the development of snapshots feature in HDFS, please follow the jira HDFS-2802.
~ Hari Mankude