This guest post from Steve Ratay, Viewpoint Architect, Teradata Corporation
Teradata’s Unified Data Architecture is a powerful combination of the Teradata Enterprise Data Warehouse, the Aster Discovery Platform, Apache Hadoop (via the Hortonworks Data Platform) and Teradata Enterprise Management tools in a single architecture.
If you are Teradata user managing an Enterprise Data Warehouse or Data Discovery platform, chances are that you are using Teradata Viewpoint, a monitoring and management platform for Teradata Systems. In order to complete Viewpoint’s monitoring of the different systems in Teradata’s Unified Data Architecture, Viewpoint 14.10 includes support for monitoring of multiple Hadoop clusters running in this architecture.
In an enterprise scenario, the biggest technical challenge in monitoring and managing the Hadoop clusters via Teradata Viewpoint lies in collecting necessary metrics on Hadoop in a reliable and continuous fashion. Different components of Hadoop expose their data in a variety of ways, involve different components like Ganglia (for metric collection), Nagios (for alerting), JMX and other interfaces that are not familiar to enterprise customers. Following are the primary issues using these components for Hadoop monitoring:
Each of these technologies exposes their data in a different format, and it would take significant development time to properly parse the data from each source. There’s also a challenge in locating and communicating with the nodes to obtain this data. Just to collect data from the namenode and jobtracker, the location of these services would have to be configured or discovered, and then failover would have to be accounted for as well. Expanding the monitoring solution beyond that to collect data from every node poses both connectivity and security issues as well. Surely there must be a better way!
Luckily Apache Ambari addresses all of these challenges and concerns by providing a collection of RESTful APIs from which a plethora of Hadoop monitoring data can be obtained, translated into easy-to-understand metrics and presented in a fashion that is already familiar to our users. There is no learning curve or additional training needed. Ambari handles the work of collecting the monitoring data from a variety of the monitoring technologies mentioned above. It then aggregates this data and provides a series of RESTful APIs. These APIs can all be accessed by making web service calls against a central node in the Hadoop cluster. All data is provided in JSON format so it can easily be parsed by just about any programming language.
Teradata Viewpoint follows standard data collection practices to collect data from Apache Amabari and stores in the Viewpoint database, which is scheduled for regular backups. The data is collected from Ambari every minute by default, and therefore the database has a view of the state of the Hadoop system over the course of an hour, day, or week. This historical data is used to generate a variety of different charts in the Viewpoint web portal, and also to enable the use of Rewind to enable users to go back and see exactly what was occurring on the Hadoop cluster at a specific point in time. This enables highly efficient troubleshooting when issues actually occur or to predict the load on a particular system in future.
By leveraging Apache Ambari’s capabilities, Teradata Viewpoint delivers a comprehensive monitoring solution for ALL the systems in your enterprise data architecture, including Multiple Hadoop Clusters. Teradata’s Java and Web developers are focused on the tasks at which they excel: Getting the data from the source system (Ambari), transforming them into easy-to-understand metrics, and displaying it in Viewpoint’s “Portlets”. No time was wasted trying to get up to speed on Ganglia, JMX, or many of the details of Hadoop’s inner workings. Ambari was a critical piece of technology to help Viewpoint roll out this solution and enhances Teradata’s Unified Data Architecture.
Teradata Viewpoint offers the following benefits to enterprise users:
Following are some of the dashboards or portlets, developed specifically for Hadoop. For information on additional dashboards, please refer to the links at the bottom of this blog.