Best Practices for Cluster Network Configuration
What should one keep in mind when configuring the network for a Hadoop cluster?
These are the best practices for configuring the network for a Hadoop cluster. These are recommended for a stable and performant Hadoop cluster.
- Machines should be on an isolated network from the rest of the data center. This means that no other applications or nodes should share network I/O with the Hadoop infrastructure. This is recommended as Hadoop is I/O intensive, and all other interference should be removed for a performant cluster.
- Machines should have static IPs. This will enable stability in the network configuration. If the network were configured with dynamic IPs, on a machine reboot or if the DNS lease were to expire then the machine’s IP address would change, and this would cause the Hadoop services to malfunction.
- Reverse DNS should be setup. Reverse DNS ensures that a node’s hostname can be looked up through the IP address. Certain Hadoop functionalities utilize and require reverse DNS.