HDFS Forum

Hadoop distcp between two secured clusters

  • #49300
    Tanzir
    Participant

    Hello everyone,
    I would like to know does hadoop distcp works between two secured hadoop clusters? If so, do we need to setup kerberos in a different way than usual?

    Suppose, I have two clusters (HDP 1.3.3):
    Namenode#1 (source cluster): hdfs://nn1:8020
    Namenode#2 (dest cluster): hdfs://nn2:8020

    I want to copy some files from one cluster to another using hadoop distcp. Example: in source cluster I have a file with path “/user/testuser/temp/file-r-0000″ and in destination cluster, the destination directory is “/user/testuser/dest/”. So what I want is to copy file-r-0000 from source cluster to target cluster’s “dest” directory.

    I have tried both of the following commands:

    hadoop distcp hdfs://nn1:8020/user/testuser/temp/file-r-0000 hdfs://nn2:8020/user/testuser/dest
    hadoop distcp hftp://nn1:8020/user/testuser/temp/file-r-0000 hdfs://nn2:8020/user/testuser/dest

    I believe I do not need to use “hftp://” since I have same version of hadoop. Again, I also tried those in both cluster, but all I’m getting are some exceptions related to security.

    When running from destination cluster with hftp:

    14/02/26 00:04:45 ERROR security.UserGroupInformation: PriviledgedActionException as:testuser@realm cause:java.net.SocketException: Unexpected end of file from server
    14/02/26 00:04:45 ERROR security.UserGroupInformation: PriviledgedActionException as:testuser@realm cause:java.net.SocketException: Unexpected end of file from server
    14/02/26 00:04:45 INFO fs.FileSystem: Couldn’t get a delegation token from nn1ipaddress:8020

    When running from source cluster:

    14/02/26 00:05:43 ERROR security.UserGroupInformation: PriviledgedActionException as:testuser@realm1 cause:java.io.IOException: Couldn’t setup connection for testuser@realm1 to nn/realm2
    With failures, global counters are inaccurate; consider running with -i
    Copy failed: java.io.IOException: Call to nn1ipaddress failed on local exception: java.io.IOException: Couldn’t setup connection for testuser@realm1 to nn/realm2

    Caused by: java.io.IOException: Couldn’t setup connection for testuser@realm1 to nn/realm2
    at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:560)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
    at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:513)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:616)
    at org.apache.hadoop.ipc.Client$Connection.access$2100(Client.java:203)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1254)
    at org.apache.hadoop.ipc.Client.call(Client.java:1098)
    … 26 more

    It also shows me host address is not present in kerberos database (I don’t have the exact log for that). So, do I need to configure kerberos in a different way in order to use discp between them?

    Thanks in advance.
    Tanzir

to create new topics or reply. | New User Registration

  • Author
    Replies
  • #49301
    Tanzir
    Participant

    More information about my current kerberos setup:

    In destination cluster:

    [testuser@hostname1 bin]$ ./klist -k /etc/security/keytabs/testuser.headless.keytab
    Keytab name: FILE:/etc/security/keytabs/testuser.headless.keytab
    KVNO Principal
    —- ————————————————————————–
    2 testuser@realm1
    2 testuser@realm1
    2 testuser@realm1
    2 testuser@realm1
    2 testuser@realm1
    2 testuser@realm1

    In source cluster:

    [testuser@hostname2 bin]$ ./klist -k /etc/security/keytabs/testuser.headless.keytab
    Keytab name: FILE:/etc/security/keytabs/testuser.headless.keytab
    KVNO Principal
    —- ————————————————————————–
    2 testuser@realm2
    2 testuser@realm2
    2 testuser@realm2
    2 testuser@realm2
    2 testuser@realm2
    2 testuser@realm2

    Do I need to setup kerberos so that it works in cross realm? Any information will be really helpful.

    Tanzir

    #49303
    Robert Molina
    Moderator

    HI Tanzir,
    One thing you have to make sure is java 1.7 is being used if you are using MIT kerberos. Also, you would have to set up test cross realm, since it looks like your clusters are in different realms. A quick test to verify its trusting is to basically run a hdfs client on a node from cluster A and see if you put a file or list on cluster B and vice versa.

    Hope that helps.

    Regards,
    Robert

    #53527
    Tanzir
    Participant

    Hey Robert,
    Thanks a lot for your response. Yes, I figured it out later. You were right, it was a cross realm issue. I thought cross realm was already configured in our clusters, which was not. After configuring it, it started working.

    Thanks again,
    Tanzir

The topic ‘Hadoop distcp between two secured clusters’ is closed to new replies.

Support from the Experts

A HDP Support Subscription connects you experts with deep experience running Apache Hadoop in production, at-scale on the most demanding workloads.

Enterprise Support »

Become HDP Certified

Real world training designed by the core architects of Hadoop. Scenario-based training courses are available in-classroom or online from anywhere in the world

Training »

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.