Importing snapshots from Amazon S3 to HBase

to create new topics or reply. | New User Registration

This topic contains 5 replies, has 2 voices, and was last updated by  John Cooper 8 months ago.

  • Creator
  • #58424



    We created snapshots and exported them to S3 using the Snapshot Export tool. We are trying to figure out how to import them into HBase so that they are a) visible as snapshots and b) can be cloned into a viable table.

    To export (as hbase user):
    hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot '$snapshotname' -copy-to s3n://$bucket_name/$snapshotname -mappers 4,
    with AWS credentials already configured in HDFS.

    We’re having trouble importing them to another cluster. There is no “Snapshot Import” tool. We’ve been attempting to use hadoop distcp to copy from S3 to the target HDFS:
    hadoop distcp s3n://s3n://$bucket_name/$snapshotname /apps/hbase/data/.hbase-snapshot
    But the import includes a different file structure than we see when we export a snapshot between clusters. Snapshots either a) don’t appear or b) are corrupted and cannot be cloned.

    Please help indicate the correct import path when copying an exported Snapshot from a DFS source.

    Relevant information:
    HDP: 2.1

Viewing 5 replies - 1 through 5 (of 5 total)

You must be to reply to this topic. | Create Account

  • Author
  • #59395

    John Cooper

    Managed to get s3n:// import to work using this tool but not s3 block import. I’m looking at forking the tool and producing a howto guide.


    John Cooper

    Found problem, it was the S3 role was missing. I thought adding authentication would be enough but only allows to copy in but cannot copy out. So now export snapshot is successful and the snapshot-s3-util export now works. The import fails using s3 block store :-

    sudo -u hbase HADOOP_CLASSPATH=YOURHADDOPPATH/lib/hbase/lib/* hadoop jar target/snapshot-s3-util-1.0.0.jar com.imgur.backup.SnapshotS3Util –import –snapshot test5-snapshot-20140822_090717 -d /hbase -k key -s secret –bucketName mybucket
    14/08/22 09:12:02 WARN security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3

    and trying s3n it doesn’t pickup the secret key.

    sudo -u hbase HADOOP_CLASSPATH=YOURHADOOPPATH/lib/hbase/lib/* hadoop jar target/snapshot-s3-util-2.0.0.jar com.imgur.backup.SnapshotS3Util –import –snapshot test1-snapshot-20140822_101514 -d /hbase -a true -k key -s secret –bucketName mybucket

    java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey

    Will see if can fix this.


    John Cooper

    It looks like the map reduce (yarn) is failing to move/rename the temporary files it creates on s3. Not sure if this is an issue with moving/renaming files/directories in S3. Works fine exporting to a normal file system using file:/// and can then use “aws s3 cp sourcefolder s3://mybucket/sourcefolder –recursive” to copy the archive and .snapshot folders to S3. The import using distcp should work as long as the .snapshot and archive folders are copied in to the hbase root in hdfs (/hbase). I’ve managed to use snapshot export to another hbase cluster and then from the other cluster use snapshot export to copy the files back. The hbase restore_snapshot worked fine.


    John Cooper

    Hi, I’ve tried this command on same version of Hortonworks and Cloudera but the export to S3 fails because the snapshot info is missing. Actual data in archive directory is there. Anything special how you setup S3? I am also working to get the import working and trying which is a wrapper for the export snapshot command. I’ve compiled it on 0.98 but is failing due to the missing snapshot info. Once I get that fix I am sure the util will run ok.

    org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn’t read snapshot info from:s3n://key:secret@mybucket/hbase/.hbase-snapshot/test3s1/.snapshotinfo

    Hortonworks doesn’t complain but the .snapshotinfo is missing just the same.

    Also need to get s3:// auth working as s3n:// has 5TB limit.

    Thanks, John.



    Bump! Any thoughts?

Viewing 5 replies - 1 through 5 (of 5 total)
Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.