Apache Hadoop Operations at Scale

Hadoop Summit curated content for dev-ops

Hadoop Summit Content Curation

Although the Hadoop Summit San Jose 2014 has come and gone, the invaluable content—keynotes, sessions, and tracks—is available here. I’ve selected a few sessions below for Hadoop system administrators and dev-ops, curating them under a general Hadoop operations theme.

Dev-ops engineers and system administrators know best that ease of operations and deployments can make or break a large Hadoop production cluster, which is why they care about all of the following:

  • how rapidly they can create or replicate a cluster;
  • how efficiently they can manage or monitor at scale;
  • how easily and programmatically they can extend or customize their operational scripts; and
  • how accurately they can foresee, forestall, or forecast resource starvation or capacity stipulation.

Not having so many desirable outcomes can deprive them of their sleep. In the case of Hadoop’s large-scale cluster operations and management, where the Enterprise Hadoop ecosystem comprises of both traditional and modern infrastructure components, the operational tasks can be herculean.  As @DevOps_Borat sanguinely satirizes:

borat

The good news is that people at the helm—at the nerve center of operations—shared their best practices on how they address and manage theses complex challenges at the Hadoop Summit. Here are a few:

Hadoop Operations at Scale

Session Title Watch View
Lessons Learned from Building Big Data Platform From Ground Up Video Slides
Managing 2000 Node Cluster with Apache Ambari Video Slides
Hadoop 2 @ Twitter, Elephant Scale Video Slides
Lessons Learned – Monitoring the Data Pipeline at Hulu Video Slides
Collection of Small Tips on Stabilizing your Hadoop Cluster Video Slides
Hadoop and OpenStack Video Slides

I cherry picked these few tracks that best addressed those topics, but you can always peruse through all the tracks on the schedule’s session description along any time slot, on any day, that piques your curiosity. sessions

For example, when you hover and click on a session description, a popup will display in which you can either elect to watch the video or view the slides.

What’s Next?

In the next blog, I’ll curate content on data access and management, in particular, the role YARN plays as an architectural anchor and the center of Modern Data Architecture (MDA).

Categorized by :
Administrator Ambari Apache Hadoop Architect & CIO Developer HDFS Operations & Management YARN

Comments

Hari Sekhon
|
July 17, 2014 at 5:28 am
|

Speaking of Monitoring – I released an Ambari monitoring plugin for Nagios using the Ambari REST API. There are also plugins for various other Hadoop ecosystem components such as HBase, HDFS etc:

I think the form is filtering comments with links, just google: Advanced Nagios Plugins Collection github.

I know Ambari already has an embedded Nagios, but this is really to unify monitoring with the larger enterprise Nagios [compatible] installation that all good ops teams should have, and allows centralizing the enterprise alerting, SMS, on-call rotas etc. There are already over 100 plugins in that collection also covering various other NoSQL and infrastructure technologies and even more plugins pending import.

Regards,

Hari Sekhon
Big Data Architect Contractor (Ex-Cloudera)
London, England

Leave a Reply

Your email address will not be published. Required fields are marked *

If you have specific technical questions, please post them in the Forums

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Join the Webinar!

YARN Ready – Using Ambari for Management
Thursday, September 4, 2014
12:00 PM Eastern / 9:00 AM Pacific

More Webinars »

Contact Us
Hortonworks provides enterprise-grade support, services and training. Discuss how to leverage Hadoop in your business with our sales team.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.
HDP 2.1 Webinar Series
Join us for a series of talks on some of the new enterprise functionality available in HDP 2.1 including data governance, security, operations and data access :