The Hortonworks Community Connection is now live. A completely rebuilt Q&A forum, Knowledge Base, Code Hub and more, backed by the experts in the industry.

You will be redirected here in 10 seconds. If your are not redirected, click here to visit the new site.

The legacy Hortonworks Forum is now closed. You can view a read-only version of the former site by clicking here. The site will be taken offline on January 31,2016

HBase Forum

Importing snapshots from Amazon S3 to HBase

  • #58424


    We created snapshots and exported them to S3 using the Snapshot Export tool. We are trying to figure out how to import them into HBase so that they are a) visible as snapshots and b) can be cloned into a viable table.

    To export (as hbase user):
    hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot '$snapshotname' -copy-to s3n://$bucket_name/$snapshotname -mappers 4,
    with AWS credentials already configured in HDFS.

    We’re having trouble importing them to another cluster. There is no “Snapshot Import” tool. We’ve been attempting to use hadoop distcp to copy from S3 to the target HDFS:
    hadoop distcp s3n://s3n://$bucket_name/$snapshotname /apps/hbase/data/.hbase-snapshot
    But the import includes a different file structure than we see when we export a snapshot between clusters. Snapshots either a) don’t appear or b) are corrupted and cannot be cloned.

    Please help indicate the correct import path when copying an exported Snapshot from a DFS source.

    Relevant information:
    HDP: 2.1

  • Author
  • #58530

    Bump! Any thoughts?

    John Cooper

    Hi, I’ve tried this command on same version of Hortonworks and Cloudera but the export to S3 fails because the snapshot info is missing. Actual data in archive directory is there. Anything special how you setup S3? I am also working to get the import working and trying which is a wrapper for the export snapshot command. I’ve compiled it on 0.98 but is failing due to the missing snapshot info. Once I get that fix I am sure the util will run ok.

    org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn’t read snapshot info from:s3n://key:secret@mybucket/hbase/.hbase-snapshot/test3s1/.snapshotinfo

    Hortonworks doesn’t complain but the .snapshotinfo is missing just the same.

    Also need to get s3:// auth working as s3n:// has 5TB limit.

    Thanks, John.

    John Cooper

    It looks like the map reduce (yarn) is failing to move/rename the temporary files it creates on s3. Not sure if this is an issue with moving/renaming files/directories in S3. Works fine exporting to a normal file system using file:/// and can then use “aws s3 cp sourcefolder s3://mybucket/sourcefolder –recursive” to copy the archive and .snapshot folders to S3. The import using distcp should work as long as the .snapshot and archive folders are copied in to the hbase root in hdfs (/hbase). I’ve managed to use snapshot export to another hbase cluster and then from the other cluster use snapshot export to copy the files back. The hbase restore_snapshot worked fine.

    John Cooper

    Found problem, it was the S3 role was missing. I thought adding authentication would be enough but only allows to copy in but cannot copy out. So now export snapshot is successful and the snapshot-s3-util export now works. The import fails using s3 block store :-

    sudo -u hbase HADOOP_CLASSPATH=YOURHADDOPPATH/lib/hbase/lib/* hadoop jar target/snapshot-s3-util-1.0.0.jar com.imgur.backup.SnapshotS3Util –import –snapshot test5-snapshot-20140822_090717 -d /hbase -k key -s secret –bucketName mybucket
    14/08/22 09:12:02 WARN security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3

    and trying s3n it doesn’t pickup the secret key.

    sudo -u hbase HADOOP_CLASSPATH=YOURHADOOPPATH/lib/hbase/lib/* hadoop jar target/snapshot-s3-util-2.0.0.jar com.imgur.backup.SnapshotS3Util –import –snapshot test1-snapshot-20140822_101514 -d /hbase -a true -k key -s secret –bucketName mybucket

    java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey

    Will see if can fix this.

    John Cooper

    Managed to get s3n:// import to work using this tool but not s3 block import. I’m looking at forking the tool and producing a howto guide.

    Dale Bradman

    Hello, I am having an issue with exporting to S3, wondering if you could give any advice….

    I get an error saying:
    2015-04-27 05:39:49,547 INFO [IPC Server handler 0 on 40333] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1429544880663_0004_m_000000_0: Error: Could not get the output FileSystem with root=s3n://<ACCESS_KEY_ID>:<SECRET_ACCESS_KEY>@<BUCKET_NAME>/2HBASE-SNAP_X
    at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.setup(
    at org.apache.hadoop.mapred.MapTask.runNewMapper(
    at org.apache.hadoop.mapred.YarnChild$
    at Method)
    at org.apache.hadoop.mapred.YarnChild.main(
    Caused by: No FileSystem for scheme: s3n
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(
    at org.apache.hadoop.fs.FileSystem.createFileSystem(
    at org.apache.hadoop.fs.FileSystem.access$200(
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(
    at org.apache.hadoop.fs.FileSystem$Cache.get(
    at org.apache.hadoop.fs.FileSystem.get(
    at org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper.setup(
    ... 8 more

    My MapReduce isn’t my strongest and I have a feeling it could be to do with not specifying an outpath for the mappers?

    The code I use to export the snapshot is:

    hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot "SNAP_X" -copy-to s3n://<ACCESS_KEY_ID>:<SECRET_ACCESS_KEY>@<BUCKET_NAME>/2HBASE-SNAP_X -mappers 3

    Dale Bradman

    An update:

    In my S3 Bucket I can see a folder structure for the “2HBASE-SNAP_X” Snapshot however there is nothing actually written to it….

    The process fails after 2015-04-27 08:59:27,305 INFO [main] mapreduce.Job: map 0% reduce 0%

    Dale Bradman

    Further update:

    What is happening is that the folder structure is getting written inside a virtual folder in the bucket. I am aware that s3 has no concept of folders but that is how it appears in file browser UI.

    Once the job has failed, the path of the folder is <BUCKET_NAME>//2HBASE-SNAP-X . Notice the double forward slash there which is different to the job that is trying to write to <BUCKET_NAME>/2HBASE-SNAP-X

    Why is this virtual folder being created and how can I get it to write to the correct path?

The forum ‘HBase’ is closed to new topics and replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.