    Jan Peters

    I’m not a Kerberos wizard, so I’m on a bit of a learning curve. I’ve followed all of the Kerberos instructions in the HDP 2.1 documentation and run into an issue where my datanodes won’t start (3 node cluster). If I roll back all of the xml files to non-kerberos versions, I can start everything from the command line. When I shut down the cluster and roll in the kerberos versions of the xml files, I’m able to start the namenode, but all of the datanodes refuse to start and the only clue I have is as follows;
    2014-07-24 11:04:22,181 INFO datanode.DataNode (SignalLogger.java:register(91)) - registered UNIX signal handlers for [TERM, HUP, INT]
    2014-07-24 11:04:22,399 WARN common.Util (Util.java:stringAsURI(56)) - Path /opt/hadoop/hdfs/dn should be specified as a URI in configuration files. Please update hdfs configuration.
    2014-07-24 11:04:23,055 INFO security.UserGroupInformation (UserGroupInformation.java:loginUserFromKeytab(894)) - Login successful for user dn/cvm0930.dg.local@DGKERB.COM using keytab file /etc/security/keytabs/dn.service.keytab
    2014-07-24 11:04:23,210 INFO impl.MetricsConfig (MetricsConfig.java:loadFirst(111)) - loaded properties from hadoop-metrics2.properties
    2014-07-24 11:04:23,274 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:startTimer(344)) - Scheduled snapshot period at 60 second(s).
    2014-07-24 11:04:23,274 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:start(183)) - DataNode metrics system started
    2014-07-24 11:04:23,279 INFO datanode.DataNode (DataNode.java:<init>(269)) - File descriptor passing is enabled.
    2014-07-24 11:04:23,283 INFO datanode.DataNode (DataNode.java:<init>(280)) - Configured hostname is cvm0932.dg.local
    2014-07-24 11:04:23,284 FATAL datanode.DataNode (DataNode.java:secureMain(2002)) - Exception in secureMain
    java.lang.RuntimeException: Cannot start secure cluster without privileged resources.
    at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:700)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:281)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1885)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1772)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1819)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1995)
    at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2019)
    2014-07-24 11:04:23,287 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
    2014-07-24 11:04:23,289 INFO datanode.DataNode (StringUtils.java:run(640)) - SHUTDOWN_MSG:

    Thanks in advance for your collective help with this.
    Jan Peters

    OK, so we sorted this issue out; it seems that we weren’t paying attention to permissions and ownership in the linux file system and when starting various processes for the cluster. Yeah, this is sorta hadoop 101 stuff, but it will bust your chops if you don’t pay attention to the details. We dug through a lot of items and found that a number of directory structures (logging included) were sensitive to permissions/ownership issues. We had the Kerberos implementation done correctly with the keytab files and such but that’s only a fraction of the journey. To summarize the rest of our learnings, we found that ‘who’ was starting a process was important in getting the datanodes running. We also determined that linux container-executor was not properly configured both in the container-executor.cfg and the yarn-site.xml files. Additionally, when we changed ownership on the container-executor, the permissions (specifically the sticky bit) were changed. Once we sifted through all the details, we were finally able to get our 3-node cluster running and run DgSecure discovery/masking tasks against this cluster. I hope this posting (although never responded to) will help others through the Kerberizing process.
