Hoya (HBase on YARN) : Application Architecture

At Hadoop Summit in June, we introduced a little project we’re working on: Hoya: HBase on YARN. Since then the code has been reworked and is now up on Github. It’s still very raw, and requires some local builds of bits of Hadoop and HBase – but it is there for the interested.

In this article we’re going to look at the architecture, and a bit of the implementation.

We’re not going to look at YARN in this article -for that we have a dedicated section of the Hortonworks site -including sample chapters of Arun Murthy’s forthcoming book. Instead we’re going to cover how Hoya makes use of YARN.

Hoya Application Architecture

Hoya creates HBase clusters on top of YARN. It does with a client application -Hoya client- that creates the persistent configuration files, sets up the HBase cluster XML files, then asks YARN to create an Application Master, here the “Hoya AM”.

YARN copies all files listed in the clients AM launch request from HDFS into the local filesystem of the chosen server, then executes the command to start the AM -here a Java main() method.

The Hoya AM starts up and takes over. The first thing it does is start an HBase Master on the local machine -this is is the sole HBase Master that Hoya currently manages. In parallel with the Master startup, Hoya asks YARN for the number of containers it needs for the number of workers this cluster has.  For each of these containers, Hoya provides the commands to start HBase. This is a key point: Hoya does not run any Hoya-specific code on the worker nodes. The Hoya application master simply points YARN at the files that need to be on the worker nodes and the command to run -and YARN does the rest of the work. Because HBase clusters use Apache Zookeeper to find each other -as do HBase clients- the HBase services locate each other automatically with neither Hoya nor YARN getting involved.

The Hoya AM gets notifications as containers are allocated and released from YARN’s Resource Manager, as well as worker-specific events from the Node Manager. It responds to the failure of HBase Region Servers, or the loss of containers, by requesting new containers and applying the same configuration and server launch process.

The Hoya AM exports an RPC service for clients to talk to. There are currently only three operations 

public String getClusterStatus() throws IOException;
public void stopCluster() throws IOException;
public boolean flexNodes(int workers) throws IOException;
  • The getClusterStatus() call returns the current state of the cluster, including a list of all active worker nodes, as a JSON file.

  • The stopCluster() command shuts the cluster down. Currently this just tells YARN to release the containers, HBase doesn’t need anything more graceful.

  • The flexNodes(int workers) call is the most interesting. It allows the client to change the number of workers in a cluster, up or down.

When the HoyaAM is told to flex the cluster (more on that another time), it changes the variables listing the number of nodes in the cluster.  It then reviews how many nodes it has compared to the number it needs, and either releases or requests new nodes. This is fairly straightforward:

public boolean flexNodes(int workers) throws IOException {
    log.info("Flexing cluster count from {} to {}", numTotalContainers, workers);

    if (numTotalContainers == workers) {
        //no-op
        log.info("Flex is a no-op");
        return false;
    }

    //update the #of workers
    numTotalContainers = workers;
    clusterDescription.workers = serviceArgs.workers;

    // ask for more containers if needed
    reviewRequestAndReleaseNodes();
    return true;
}

Here is another key point about Hoya: the actual reviewRequestAndReleaseNodes() method is the same one that Hoya uses to react to any container failure. Hoya does not differentiate between an unintended container loss triggered by a server failure, or the cluster size being changed explicitly. All it does is measure the difference, releasing containers if it has too many. If there isn’t enough, it asks the RM for the new containers, then, when it is allocated them, it tells the Node Managers how to run HBase Region Servers in them.

That’s it. Once the basic logic is there to ask for a set of worker nodes, to get the HBase files out and start the region server, supporting flexing is trivial. YARN does the real work for us.

YARN keeps an eye on the health of the containers, telling the AM when there is a problem. It also monitors the Hoya AM itself. When the AM fails, YARN allocates a new container for it, and restarts it.  This provides an availability solution to Hoya without it having to code it in itself.

This shows a key design goal of YARN: run more things in a Hadoop cluster than MapReduce jobs. MR jobs are great for analysis and transformation of bulk data, but big data applications need more than this. HBase is often part of the bigger application -and now Hoya lets users bring up HBase clusters whenever they need them.

Take a look at Hoya: HBase on YARN here, and find out more about YARN here.

Categorized by :
HBase YARN

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Join the Webinar!

Big Data Virtual Meetup Chennai
Wednesday, October 29, 2014
9:00 pm India Time / 8:30 am Pacific Time / 4:30 pm Europe Time (Paris)

More Webinars »

HDP 2.1 Webinar Series
Join us for a series of talks on some of the new enterprise functionality available in HDP 2.1 including data governance, security, operations and data access :
Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.
Integrate with existing systems
Hortonworks maintains and works with an extensive partner ecosystem from broad enterprise platform vendors to specialized solutions and systems integrators.