OK, so we sorted this issue out; it seems that we weren’t paying attention to permissions and ownership in the linux file system and when starting various processes for the cluster. Yeah, this is sorta hadoop 101 stuff, but it will bust your chops if you don’t pay attention to the details. We dug through a lot of items and found that a number of directory structures (logging included) were sensitive to permissions/ownership issues. We had the Kerberos implementation done correctly with the keytab files and such but that’s only a fraction of the journey. To summarize the rest of our learnings, we found that ‘who’ was starting a process was important in getting the datanodes running. We also determined that linux container-executor was not properly configured both in the container-executor.cfg and the yarn-site.xml files. Additionally, when we changed ownership on the container-executor, the permissions (specifically the sticky bit) were changed. Once we sifted through all the details, we were finally able to get our 3-node cluster running and run DgSecure discovery/masking tasks against this cluster. I hope this posting (although never responded to) will help others through the Kerberizing process.
mit Freundlichen Grüßen (with Friendly Greetings),