Home Forums HDFS Hadoop distcp between two secured clusters

Tagged: , , ,

This topic contains 3 replies, has 2 voices, and was last updated by  Tanzir 5 months, 1 week ago.

  • Creator
    Topic
  • #49300

    Tanzir
    Participant

    Hello everyone,
    I would like to know does hadoop distcp works between two secured hadoop clusters? If so, do we need to setup kerberos in a different way than usual?

    Suppose, I have two clusters (HDP 1.3.3):
    Namenode#1 (source cluster): hdfs://nn1:8020
    Namenode#2 (dest cluster): hdfs://nn2:8020

    I want to copy some files from one cluster to another using hadoop distcp. Example: in source cluster I have a file with path “/user/testuser/temp/file-r-0000″ and in destination cluster, the destination directory is “/user/testuser/dest/”. So what I want is to copy file-r-0000 from source cluster to target cluster’s “dest” directory.

    I have tried both of the following commands:

    hadoop distcp hdfs://nn1:8020/user/testuser/temp/file-r-0000 hdfs://nn2:8020/user/testuser/dest
    hadoop distcp hftp://nn1:8020/user/testuser/temp/file-r-0000 hdfs://nn2:8020/user/testuser/dest

    I believe I do not need to use “hftp://” since I have same version of hadoop. Again, I also tried those in both cluster, but all I’m getting are some exceptions related to security.

    When running from destination cluster with hftp:

    14/02/26 00:04:45 ERROR security.UserGroupInformation: PriviledgedActionException as:testuser@realm cause:java.net.SocketException: Unexpected end of file from server
    14/02/26 00:04:45 ERROR security.UserGroupInformation: PriviledgedActionException as:testuser@realm cause:java.net.SocketException: Unexpected end of file from server
    14/02/26 00:04:45 INFO fs.FileSystem: Couldn’t get a delegation token from nn1ipaddress:8020

    When running from source cluster:

    14/02/26 00:05:43 ERROR security.UserGroupInformation: PriviledgedActionException as:testuser@realm1 cause:java.io.IOException: Couldn’t setup connection for testuser@realm1 to nn/realm2
    With failures, global counters are inaccurate; consider running with -i
    Copy failed: java.io.IOException: Call to nn1ipaddress failed on local exception: java.io.IOException: Couldn’t setup connection for testuser@realm1 to nn/realm2

    Caused by: java.io.IOException: Couldn’t setup connection for testuser@realm1 to nn/realm2
    at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:560)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
    at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:513)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:616)
    at org.apache.hadoop.ipc.Client$Connection.access$2100(Client.java:203)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1254)
    at org.apache.hadoop.ipc.Client.call(Client.java:1098)
    … 26 more

    It also shows me host address is not present in kerberos database (I don’t have the exact log for that). So, do I need to configure kerberos in a different way in order to use discp between them?

    Thanks in advance.
    Tanzir

Viewing 3 replies - 1 through 3 (of 3 total)

The topic ‘Hadoop distcp between two secured clusters’ is closed to new replies.

  • Author
    Replies
  • #53527

    Tanzir
    Participant

    Hey Robert,
    Thanks a lot for your response. Yes, I figured it out later. You were right, it was a cross realm issue. I thought cross realm was already configured in our clusters, which was not. After configuring it, it started working.

    Thanks again,
    Tanzir

    Collapse
    #49303

    Robert Molina
    Moderator

    HI Tanzir,
    One thing you have to make sure is java 1.7 is being used if you are using MIT kerberos. Also, you would have to set up test cross realm, since it looks like your clusters are in different realms. A quick test to verify its trusting is to basically run a hdfs client on a node from cluster A and see if you put a file or list on cluster B and vice versa.

    Hope that helps.

    Regards,
    Robert

    Collapse
    #49301

    Tanzir
    Participant

    More information about my current kerberos setup:

    In destination cluster:

    [testuser@hostname1 bin]$ ./klist -k /etc/security/keytabs/testuser.headless.keytab
    Keytab name: FILE:/etc/security/keytabs/testuser.headless.keytab
    KVNO Principal
    —- ————————————————————————–
    2 testuser@realm1
    2 testuser@realm1
    2 testuser@realm1
    2 testuser@realm1
    2 testuser@realm1
    2 testuser@realm1

    In source cluster:

    [testuser@hostname2 bin]$ ./klist -k /etc/security/keytabs/testuser.headless.keytab
    Keytab name: FILE:/etc/security/keytabs/testuser.headless.keytab
    KVNO Principal
    —- ————————————————————————–
    2 testuser@realm2
    2 testuser@realm2
    2 testuser@realm2
    2 testuser@realm2
    2 testuser@realm2
    2 testuser@realm2

    Do I need to setup kerberos so that it works in cross realm? Any information will be really helpful.

    Tanzir

    Collapse
Viewing 3 replies - 1 through 3 (of 3 total)