Hadoop tutorial: Apache Ambari

One of the impediments to Apache Hadoop adoption most commonly cited by enterprises is the perceived difficulty of effectively managing data in a new format. Many companies become so preoccupied trying to keep up with big data's barrage of information that they lose focus on how to analyze it effectively. Using outmoded data insight software to try and wrangle actionable feedback is often like trying to fit a box of square pegs into the single round hole. Apache Hadoop makes keeping up with data easier, but some organizations are just dipping their toe into the Hadoop pool when they could be diving in with Apache Ambari.

What Apache Ambari does
Apache Ambari is the tool with which a company can manage and monitor all of its Hadoop clusters. It is entirely open source, so it can be easily adapted to fit new Hadoop models, but it provides a operative lens through which an organization can view its other components as well. Currently, Ambari supports the Hadoop Distributed File System and MapReduce, the two core features of Hadoop, as well as essential components like  Hive, HBase, Zookeeper, HCatalog and Pig. The platform also supports elements like Oozie and Sqoop, all significant contributors to the Hadoop ecosystem. Its importance for minimizing the difficulties in comprehensive Hadoop adoption make it one of the top big data-related applications. Perhaps this is why GigaOM's Derek Harris mentioned Ambari first on his list of the top five tools to utilize for effective Hadoop management.

A guide to Apache Ambari components
Beyond the provisioning and management of Hadoop clusters, several facets of Apache Ambari make it an essential part in the ongoing process of monitoring the various components of Hadoop architecture.

1) Ganglia

Ganglia is a scalable monitoring solution for Hadoop clusters, especially HBase, and it enables more efficient data storage, sharing and visualization.

2) Nagios

Nagios serves as a real-time monitor of Hadoop operational capacities, ensuring that Hadoop's various parts are functioning optimally and using an automated alert system to impede stoppages and downtime.

Making the most of Hadoop architecture right from the beginning can have a huge impact down the road, according to Information Management contributor Hannah Smalltree.

"Analytics project leads, in particular, must also stay on top of industry and technology trends to anticipate future needs of the business," Smalltree wrote. "With analytics increasingly being a competitive advantage, businesses must move quickly to implement new capabilities and that may mean rethinking the underlying architecture."

Categorized by :

Leave a Reply

Your email address will not be published. Required fields are marked *

Hortonworks Data Platform
The Hortonworks Data Platform is a 100% open source distribution of Apache Hadoop that is truly enterprise grade having been built, tested and hardened with enterprise rigor.
Get started with Sandbox
Hortonworks Sandbox is a self-contained virtual machine with Apache Hadoop pre-configured alongside a set of hands-on, step-by-step Hadoop tutorials.
Modern Data Architecture
Tackle the challenges of big data. Hadoop integrates with existing EDW, RDBMS and MPP systems to deliver lower cost, higher capacity infrastructure.