cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
June 19, 2015
prev slideNext slide

Multihoming on Hadoop YARN clusters

Introduction

Multihoming is the practice of connecting a host to more than a single network. This is frequently used to provide network-level fault tolerance – if hosts are able to communicate on more than one network, the failure of one network will not render the hosts inaccessible. There are other use cases for multi-homing as well, including traffic segregation to isolate congestion and support for different network media optimized for different use cases. Multihoming can be physical, with multiple network interfaces, logical with multiple IP addresses on one interface, or a combination of the two.

mh_1

Configuring Hadoop YARN for Multihome environments

The Apache Hadoop YARN project has recently enhanced support for multi-homing by completing the rollout of the bind-host parameter across the YARN suite of services, available in Apache Hadoop release 2.6 and the Hortonworks Data Platform (HDP) 2.2. The implementation is tracked at the Apache JIRA ticket YARN-1994 and complements similar functionality available in HDFS. At a low level, for every YARN server, a new configuration property, “bind-host,” enables an administrator to control the binding argument passed to the java socket listener and underlying operating system for YARN services so that it can be made to listen on a different interface than it would based on the client-facing endpoint address. Typical usage for this feature is to cause YARN services to listen on all of the interfaces on a multihomed host by setting bind host to the all-wildcard value, “0.0.0.0”. Please note that there is no port component to the bind-host parameter, the port listened on by a service is still determined based on the behavior for the service’s address – which means it uses the port configured for the service if there is one and falls back to the in-code default when not. For example, if the yarn.resourcemanager.address is rm.mycluster.internal:9999 and the yarn.resourcemanager.bind-host is set to 0.0.0.0, the resource manager will listen on all of the hosts addresses on the 9999 port.

For convenience, bind-host parameters cover all listeners for a given daemon. For example, inserting the below into the yarn-site.xml will configure all of the ResourceManager services and web applications to listen on all interfaces, typical for a multihomed configuration:


<property>
<name>yarn.resourcemanager.bind-host</name>
<value>0.0.0.0</value>
</property>

Entries for other daemons will typically have the same value, here is a list of the relevant names grouped by the configuration file:

  • yarn-site.xml
    • yarn.resourcemanager.bind-host
    • yarn.nodemanager.bind-host
    • yarn.timeline-service.bind-host
  • mapred-site.xml
    • mapreduce.jobhistory.bind-host

Setting all of these values to 0.0.0.0 as in the example above will cause the core YARN and MapReduce daemons to listen on all addresses and interfaces of the hosts in the cluster.

Client connections to the service are not aware of the presence of bind-host configuration, they will continue to connect to the service based on its address configuration. Different clients may connect to different interfaces based on name resolution and their network location, but any given client will only connect to a single interface of the service at a time.

Distributed File System’s network considerations

Processes running on nodes in a Hadoop cluster (think YARN containers) typically need access to a distributed file system such as HDFS. Strictly speaking, it is not required that the DFS be accessible from all networks the physical cluster may be listening on via multihoming. As long as all the nodes resolve the DFS endpoints using reachable addresses, they will be able to use the storage. However, clients will generally want to access the storage as well, so more often than not HDFS will be configured for multihoming if YARN is also configured for multihoming. A guide to configuring HDFS for multihomed operation is available here.

(It is recommended that the parameters dfs.client.use.datanode.hostname and dfs.datanode.use.datanode.hostname are also set to true for best results, in addition to the HDFS bind-host parameters.)

Host Name Resolution

Properly managing name resolution is essential to a successful multihomed Hadoop rollout. This management will generally occur via DNS where different clients resolve cluster names to appropriate addresses depending on their network of origin, although host files may be an option for smaller clusters, ad-hoc clusters, or environments where the name management process is through the automated management of local files. Whatever the hostname management strategy is, here are the ground rules to follow for a multihomed setup:

  • All addressing must be name-based in a multihomed cluster – all nodes should always be referred to by their host-names during configuration and/or access, never directly by their IP-addresses.
  • A host must resolve from the same host-name on all networks for all clients (even though they may resolve to a different IP-address for the host depending on what network the client is coming from).
  • For any given client, the address the host resolves to must be reachable for that client – it will control which network and interface the client will use.

It’s important to note that the same rules apply to the cluster nodes themselves (as, among other things, they also function as clients to the services running in the cluster.) Although they may resolve to several addresses for clients coming into the cluster, hosts within the cluster should resolve one another to addresses on the network. This is intended to handle cluster traffic, for e.g., a high capacity network dedicated to the purpose. The name resolution of cluster nodes to one another will determine which network handles the cluster’s own traffic (e.g. between YARN nodes (map output, etc), and to/from HDFS).

Walkabout

Let’s take an example of a two network multihomed cluster designed to provide the cluster with a high performance backplane with a separate, dedicated network for client access.

We’ll use one subnet, 10.2.x.x, for the cluster itself, and a second, 10.3.x.x, for client access. In this case the separation will be physical as well as logical – the 10.2 network will be on a high bandwidth media on a dedicated physical interface on each host and the 10.3 network on a more standard networking technology for a general purpose network. We’ll consider one job and a handful of nodes (from within a larger cluster). For simplicity in this example we’ll let the final octet of the address be the same on each network interface for the same host, although there is no requirement that this be the case (only name resolution has to remain in common for the host across clients/networks/interfaces, details of the ip addresses are not important).

mh_2

  1. ResourceManager listens on 10.2.0.44 on the internal cluster network and 10.3.0.44 on the external client network
  2. A client (CLIENT) at 10.3.0.99 submits a wordcount job to the Resourcemanager, connecting to it on its 10.3.0.44 interface
  3. The Resourcemanager schedules the application and starts a Mapreduce ApplicationMaster on an application host, we’ll call it APP1, by connecting to a NodeManager listening on the cluster network address for the host on it’s internal IP, 10.2.0.55
  4. The ApplicationMaster connects to the ResourceManager on 10.2.0.44 and is allocated a container on it’s neighbor, APP2, which is called on 10.2.0.66 to start the map task.
  5. In the meantime, the user on CLIENT wants to see how things are going, so he/she navigates to the ResourceManager’s web-interface which resolves to 10.3.0.44 for the user. He/she clicks on the application and is taken to the ApplicationMaster on APP1, reaching it via it’s 10.3.0.55 address
  6. The map task completes, and a reduce task is started on APP1, which pulls the data in using its 10.2.0.55 interface and writes the results back out to a node, DATA, via its internal IP 10.2.0.77
  7. At job completion the ApplicationMaster uploads its jobsummary to HDFS / DATA and JOBHIST loads the job details via its internal 10.2.0.88 address
  8. Since the job is complete, the next time the user checks on status he/she is redirected from RM (which he/she reached on the 10.3 network) to the the JobHistoryServer, JOBHIST, which he/she will connect to on its 10.3.0.88 address
  9. Finally, CLIENT pulls down the reducer output from HDFS using the second interface on DATA, 10.3.0.77, and the activity is complete

Kicking the Tires

If you have a local development environment which you use for testing purposes, you can try out multihoming easily just by using your existing network interface and the built in localhost interface. If you configure one or more service addresses, for e.g. yarn.resourcemanager.webapp.address, to your machine’s network name, you’ll find that the service in question will not be accessible via the localhost address 127.0.0.1 (using the port configured for the service address). If you add in the bind-host parameter for the service (yarn.resourcemanager.bind-host 0.0.0.0) and restart, you’ll find that you can now access the service using the 127.0.0.1 address as well (+ the port configured for the service…).

While this is not terribly useful in practice, it’s an easy way to see the feature in operation. You can also manually configure other non-conflicting logical addresses on your network interface, perhaps on a reserved subnet not in use on your network, and try it out that way. You would need to temporarily add an entry to your host file with your new address to fully vet the process. And if you have a mini-cluster of workstations, or a virtualized setup with bridging, you can configure them all and see things in operation on a small scale (they will all require host entries for one another in addition to themselves).

Taking a Test Drive

If you are considering deployment on an in-house cluster, you may want to gain some additional experience before taking the plunge. One option is to try it in the cloud – Amazon Web Services’ EC2 platform includes the Virtual Private Cloud (VPC) capability. A default VPC configuration with a public network, a private network, and a gateway server is enough to get started. Deploy a few boxes with virtual interfaces configured for both networks, ensure that the hosts resolve to one another via their private network addresses, install Hadoop as you normally would, set the bind-host parameters as described above, and you now have a place to explore a multihome YARN setup in more depth. Due to the isolation provided by VPC, you may want to spin up a couple of workstations inside the VPC to which you can remote-desktop to enable full exploration, you’ll need to perform port pass through of some sort from the gateway node (ssh tunneling is an option). This will be necessary to be able to hit the web applications via a web browser and so on.

Multihome and Fault Tolerance in Hadoop

Outside of Hadoop, multihoming is frequently used as a fault tolerance strategy, but for YARN its applicability to this purpose is not as strong.

Unlike many legacy or client-server systems, Hadoop availability is based on the fact that nodes have redundancy within the cluster by design. Failure of a network interface is, from the cluster’s perspective, simply a particular type of node failure.

Using multiple-NICs and networks to ensure the availability of individual key hosts is not, generally, a necessity for fault tolerance in YARN. At the network level, as well, clusters tend to live inside data centers where the need for backup networks to preserve network availability between hosts is somewhat less important, and network redundancy for remote clients which may access the network over unreliable links is generally handled at the level of the externally routed network and does not tend to require multihomed Hadoop hosts as such.

That’s nice, what’s it good for?

In Hadoop YARN’s case, multihoming is a strategy useful to assure network performance and predictability for both applications and management purposes and, to a lesser extent, a tool with use in the security and fault-tolerance domains. The ability to expose a YARN cluster directly over multiple networks enables monitoring to occur on a network which cannot be saturated by cluster traffic, ensuring that failure messages will not fail to reach management stations. It also enables dedicated high-speed interfaces to be established between nodes without having to compete with other traffic, and the introduction of intrusion detection and other safeguards between a cluster and client traffic without impact to cluster traffic. For these and other reasons, many organizations already employ multihomed networking with their data management services. With the general availability of the bind-host configuration within YARN in HDP 2.2, they can now enjoy these same advantages with their YARN Hadoop services.

Tags:

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>