Meet the Committer, Part Two: Matt Foley
I hope you had fun pigging out to Hadoop with Alan Gates. We had interesting questions during the webinar and as always, your participation in these discussions will help us understand different use cases of Apache Pig and the growing community around this project. The recording is now available on our webinar site.
For the next installation of “Future of Apache Hadoop” webinar series, I would like to introduce to you Matt Foley and Ambari. Matt is a member of Hortonworks technical staff, Committer and PMC member for Apache Hadoop core project and will be our guest speaker on September 26, 2012 @10am PDT / 1pm EDT webinar: Deployment and Management of Hadoop Clusters with AMBARI.
Get to know Matt in this second installment of our “Meet the Committer” series.
Kim: Tell us your role with Apache Hadoop?
Matt: I’m a Committer and PMC member for Apache Hadoop. I’ve also been the Release Manager for the last several releases of Hadoop-1. I want Hadoop and HBase to be used by more and more companies, and to make that easier I’ve become very interested in deployment and monitoring issues, and have contributed to the Ambari project.
Kim: What’s an Ambari?
Matt: An Ambari is the platform or shelter that sits on top of the elephant, for a royal passenger to ride in comfort. Also known as a “howdah”.
Kim: How did this project came about?
Matt: While the Hortonworks engineers were still part of Yahoo’s Cloud Computing group, they saw the need for an Apache open source project to make it easier to deploy, monitor, and manage Hadoop clusters. These clusters can be multiple thousands of nodes, and it’s hard to deploy and manage clusters that large! So we started Ambari, as an Apache “incubator” project, to meet those needs.
Kim: Can you provide a brief use case on why people should want to use/deploy Ambari?
Matt: Suppose you have a serious Big Data application that needs a cluster of even a hundred servers. You can’t possibly want to login to all those servers and individually install Hadoop on each of them. And you don’t just want to install Hadoop, you also need HBase and Hive and Pig and Oozie and HCatalog, etc. You have to install them all, and you have to get the right versions of each so they’ll work together, and you need to start the various services in the right order, on all 100 servers. Furthermore, before you can install Hadoop, you have to set up quite a bit of configuration on each server, so that the service user IDs will exist and have the right permissions, and so the “install master” server, from which you’re doing all this work, has privileges to push the software to each of the other servers. Basically, to install manually would take you about half an hour per server, after you get good at it! So it’s obvious that you need an automation tool to do the deployment. Ambari can install a whole 100-node cluster in about 20 minutes, and a 1000 node cluster in less than an hour.
Then, after you’ve installed and started up your cluster, you have to monitor it. In a cluster of a few thousand servers, you can expect to have a server or disk failure per day (although Hadoop will robustly adapt to such failures and keep running fine). You need a monitoring system to alert you when something goes wrong and tell you what the problem is, or the cluster will degrade over time. Also, you need to be aware of the load on the system, and whether your Hadoop and HBase jobs are being run efficiently, and whether you’ve provisioned the cluster appropriately. For all these things, Ambari will automatically set up a monitoring and alerting system, based on open source monitoring tools called Nagios and Ganglia, but configured specifically to monitor Hadoop clusters. There’s a lot of distilled expertise in Ambari, about how to monitor big Hadoop clusters.
Kim: Can you provide a sneak peek of your presentation and what do you expect will be key take-away for folks attending this webinar?
Matt: This presentation will be very similar to the talk I gave at the Hadoop Summit in June. I’ll present:
- A brief history of Ambari, and how its architecture has evolved and will continue growing;
- In-depth discussion of the Install, Monitor, and Management features, illustrated with screen shots of Ambari being used with an actual cluster.
After the presentation, participants should feel comfortable applying Ambari to create new Hadoop and HBase clusters, and will understand the value of the monitoring and alerting capabilities.
Get ready to geek out to Ambari with Matt, join us on September 26, 2012 @10am PDT/ 1pm EDT for “Deployment and Management of Hadoop Clusters with AMBARI”.