cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
July 31, 2014
prev slideNext slide

The Future of Apache Ambari

It’s been a busy year for Apache Ambari. Keeping up with the rapid innovation in the open community certainly is exciting. We’ve already seen six releases this year to maintain a steady drumbeat of new features and usability guardrails. We have also seen some exciting announcements of new folks jumping into the Ambari community.

With all these releases and community activities, let’s take a break to talk about how the broader Hadoop community is affecting Ambari and how this is influencing what you will see from Ambari in the future.

Take a Look Around

To talk about the future of Ambari, we have to recognize what is happening outside of Ambari in the Hadoop community. We have to talk about Apache Hadoop YARN.

YARN is the operating system for data processing, making it possible to bring multiple workloads and processing engines to the data stored in Apache Hadoop 2. YARN enables a single platform for storage and processing that can handle different access patterns (batch, interactive and real-time). This fundamentally changes the future of data management.

operations

The YARN Effect

Just as YARN is re-shaping the definition—and capabilities—of Hadoop, Ambari complements YARN by enabling Hadoop operators to efficiently and securely harness the power of Hadoop by provisioning, managing, and monitoring their clusters.

Different workloads, different processing engines, and different access patterns mean Ambari needs to be flexible while maintaining the stable, predictable operational capabilities that an enterprise expects.

To that end, we at Hortonworks focus on rallying the community around three Ambari areas: operate, integrate, and extend.

operations_2

Operate: Provision, Manage, and Monitor

Ambari’s core function is to provision, manage, and monitor a Hadoop Stack.

Ambari has come a long way standardizing the stack operations model, and Ambari Stacks proves this progress. Stacks wrap services of all shapes and sizes with a consistent definition and lifecycle-control layer. With this wrapper in-place, Ambari can rationalize operations over a broad set of services.

To the Hadoop operator, this means that regardless of differences across services (e.g.install/start/stop/configure/status) each service can be managed and monitored with a consistent approach.

This also provides a natural extension point for operators and the community to create their own custom stack definitions to “plug-in” new services that can co-exist with Hadoop. For example, we have seen the Gluster team at Red Hat extend Ambari and implement a custom stack that deploys Hadoop on GlusterFS.

We expect to see more of this type of activity and rapid adoption in the community, to take advantage of Stacks.

As for provisioning, the Stacks technology also rationalizes the cluster install experience across a set of services. Stacks enable Ambari Blueprints. Blueprints deliver a trifecta of benefits:

  • A repeatable model for cluster provisioning (for consistency);
  • A method to automate cluster provisioning (for ad hoc cluster creation, whether bare metal or cloud);
  • A portable and cohesive definition of a cluster (for sharing best practices on component layout and configuration).

As Shaun Connolly discussed during his Hadoop Summit 2014 keynote, Hadoop adoption spans data lifecycles (i.e. learn, dev/test, discovery, production). Blueprints enable this connected lifecycle and provide consistency, portability and deployment flexibility.

We expect Blueprints to become a shared language for defining a Hadoop cluster and for Ambari to become a key component for provisioning clusters in an automated fashion, whether using bare metal or cloud infrastructure.

Integrate: Enterprise Tools, Skills, and Systems

To create modern data architecture, Hadoop must be integrated with existing data center management systems. Fortunately, the Apache developer community designed Ambari with a robust REST API that exposes cluster management controls and monitoring information.

This API facilitates Ambari’s integration with existing processes and systems to automate operational workflows (such as automatic host decommission/re-commission in alert scenarios). As adoption of Hadoop progresses and the types of workloads supported by Hadoop expand, data center operations teams will still be able to leverage their existing investments in people, processes, and tools.

We have already seen our partners put their weight behind ease of Hadoop integration. For example, Ambari SCOM Management Pack leverages Ambari to bring Hadoop monitoring information into Microsoft System Center Operations Manager. Teradata Viewpoint uses Ambari as a single integration point for Hadoop management. As we look ahead, we anticipate other Hadoop ecosystem products and broader systems management products to follow this pattern.

Extend: Customizing the Interaction Experience for Operators and Users

This is where it really gets interesting. If someone extends Ambari with a custom service (via Stacks) then the rationalized operational controls (as defined in the Stack) should “just work”. The lifecycle definition makes it possible to expose consistently the service control from the Ambari Web UI and the Ambari REST API.

But as new services are brought into Ambari, that will introduce new requirements on how Ambari manages, organizes, and visualizes information about the cluster. With all these services under Ambari, certain capabilities are going to be unique and role-based. How do you provide an extensibility point for the community to easily “plug-in” custom features that go beyond core operations? How do you expose (or limit) those capabilities to the operators and end users with a consistent interaction experience? That’s where Ambari Views come in.

Ambari Views will enable the community and operators to develop new ways to visualize operations, troubleshoot issues and interact with Hadoop. They will provide a framework to offer those experiences to specific sets of users. Via the pluggable UI framework, operators will be able to control which users get certain capabilities and to customize how those users interact with Hadoop.

The community has been working on the Views Framework for some time, and this presentation (given at Hadoop Summit 2014) provides a good overview of the technology.

We look to release the Views Framework in the upcoming Ambari 1.7.0 release. In the meantime, you can get a preview of some Views and the Views Framework itself by digging into the examples or by grabbing one of the current contributions.

What’s Next

The building blocks for operating, integrating, and extending Ambari are in place, and all of these will empower Hadoop operators to harness Hadoop’s full range of capabilities. Keep watching Ambari’s progress. The best is yet to come.

Happy Hadooping!

Discover and Learn More

Tags:

Comments

  • Regarding enterprise integration see the “Advanced Nagios Plugins Collection” (on GitHub) which has Nagios Plugins for Ambari – this allows integration to a unified infrastructure monitoring platform leveraging all the usual features such as custom thresholding and alerting behaviours, SMS notifications, escalations, on-call rotas etc

  • Leave a Reply

    Your email address will not be published. Required fields are marked *

    If you have specific technical questions, please post them in the Forums

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>