Home Forums HBase Importing snapshots from Amazon S3 to HBase

This topic contains 5 replies, has 2 voices, and was last updated by  John Cooper 1 month ago.

  • Creator
    Topic
  • #58424

    techops_korrelate
    Participant

    Hello,

    We created snapshots and exported them to S3 using the Snapshot Export tool. We are trying to figure out how to import them into HBase so that they are a) visible as snapshots and b) can be cloned into a viable table.

    To export (as hbase user):
    hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot '$snapshotname' -copy-to s3n://$bucket_name/$snapshotname -mappers 4,
    with AWS credentials already configured in HDFS.

    We’re having trouble importing them to another cluster. There is no “Snapshot Import” tool. We’ve been attempting to use hadoop distcp to copy from S3 to the target HDFS:
    hadoop distcp s3n://s3n://$bucket_name/$snapshotname /apps/hbase/data/.hbase-snapshot
    But the import includes a different file structure than we see when we export a snapshot between clusters. Snapshots either a) don’t appear or b) are corrupted and cannot be cloned.

    Please help indicate the correct import path when copying an exported Snapshot from a DFS source.

    Relevant information:
    HDP: 2.1
    HBase: 0.98.0.2.1

Viewing 5 replies - 1 through 5 (of 5 total)

You must be logged in to reply to this topic.

  • Author
    Replies
  • #59395

    John Cooper
    Participant

    Managed to get s3n:// import to work using this tool but not s3 block import. I’m looking at forking the tool and producing a howto guide.

    Collapse
    #59113

    John Cooper
    Participant

    Found problem, it was the S3 role was missing. I thought adding authentication would be enough but only allows to copy in but cannot copy out. So now export snapshot is successful and the snapshot-s3-util export now works. The import fails using s3 block store :-

    sudo -u hbase HADOOP_CLASSPATH=YOURHADDOPPATH/lib/hbase/lib/* hadoop jar target/snapshot-s3-util-1.0.0.jar com.imgur.backup.SnapshotS3Util –import –snapshot test5-snapshot-20140822_090717 -d /hbase -k key -s secret –bucketName mybucket
    14/08/22 09:12:02 WARN security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3

    and trying s3n it doesn’t pickup the secret key.

    sudo -u hbase HADOOP_CLASSPATH=YOURHADOOPPATH/lib/hbase/lib/* hadoop jar target/snapshot-s3-util-2.0.0.jar com.imgur.backup.SnapshotS3Util –import –snapshot test1-snapshot-20140822_101514 -d /hbase -a true -k key -s secret –bucketName mybucket

    java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey

    Will see if can fix this.

    Collapse
    #59043

    John Cooper
    Participant

    It looks like the map reduce (yarn) is failing to move/rename the temporary files it creates on s3. Not sure if this is an issue with moving/renaming files/directories in S3. Works fine exporting to a normal file system using file:/// and can then use “aws s3 cp sourcefolder s3://mybucket/sourcefolder –recursive” to copy the archive and .snapshot folders to S3. The import using distcp should work as long as the .snapshot and archive folders are copied in to the hbase root in hdfs (/hbase). I’ve managed to use snapshot export to another hbase cluster and then from the other cluster use snapshot export to copy the files back. The hbase restore_snapshot worked fine.

    Collapse
    #58999

    John Cooper
    Participant

    Hi, I’ve tried this command on same version of Hortonworks and Cloudera but the export to S3 fails because the snapshot info is missing. Actual data in archive directory is there. Anything special how you setup S3? I am also working to get the import working and trying https://github.com/lospro7/snapshot-s3-util which is a wrapper for the export snapshot command. I’ve compiled it on 0.98 but is failing due to the missing snapshot info. Once I get that fix I am sure the util will run ok.

    org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn’t read snapshot info from:s3n://key:secret@mybucket/hbase/.hbase-snapshot/test3s1/.snapshotinfo

    Hortonworks doesn’t complain but the .snapshotinfo is missing just the same.

    Also need to get s3:// auth working as s3n:// has 5TB limit.

    Thanks, John.

    Collapse
    #58530

    techops_korrelate
    Participant

    Bump! Any thoughts?

    Collapse
Viewing 5 replies - 1 through 5 (of 5 total)