Extending Apache Ambari and Hadoop data to the Teradata Ecosystem
This guest post from Steve Ratay, Viewpoint Architect, Teradata Corporation
Teradata’s Unified Data Architecture is a powerful combination of the Teradata Enterprise Data Warehouse, the Aster Discovery Platform, Apache Hadoop (via the Hortonworks Data Platform) and Teradata Enterprise Management tools in a single architecture.
If you are Teradata user managing an Enterprise Data Warehouse or Data Discovery platform, chances are that you are using Teradata Viewpoint, a monitoring and management platform for Teradata Systems. In order to complete Viewpoint’s monitoring of the different systems in Teradata’s Unified Data Architecture, Viewpoint 14.10 includes support for monitoring of multiple Hadoop clusters running in this architecture.
Challenges with Monitoring Hadoop
In an enterprise scenario, the biggest technical challenge in monitoring and managing the Hadoop clusters via Teradata Viewpoint lies in collecting necessary metrics on Hadoop in a reliable and continuous fashion. Different components of Hadoop expose their data in a variety of ways, involve different components like Ganglia (for metric collection), Nagios (for alerting), JMX and other interfaces that are not familiar to enterprise customers. Following are the primary issues using these components for Hadoop monitoring:
- Unfamiliarity with the Hadoop management tools, need for training which involves a learning curve and ramp-up period for existing teams responsible for monitoring and management of infrastructure.
- Lack of integration with existing enterprise tools. Complexity in handling and understanding multiple user interfaces for diverse systems in the environment.
- Parsing the data from each tool and being able to locate and connect to these tools on each Hadoop node is not straightforward.
- Significant development time involved to unify different data formats exposed by different technologies on the Hadoop side.
- Challenges in locating and communicating with ALL the nodes to obtain the monitoring data. For example, to collect the data form NameNode and job tracker, the location of these services needs to be configured. Connectivity to different nodes poses security issues as well.
- Additional backup, restore and archiving strategies needs to be in-place to account for Hadoop systems.
- Lack of knowledge from our Teradata users about these new management tools.
- Parsing the data from each different interface and being able to locate and connect to these interfaces on each Hadoop node is not easy.
Each of these technologies exposes their data in a different format, and it would take significant development time to properly parse the data from each source. There’s also a challenge in locating and communicating with the nodes to obtain this data. Just to collect data from the namenode and jobtracker, the location of these services would have to be configured or discovered, and then failover would have to be accounted for as well. Expanding the monitoring solution beyond that to collect data from every node poses both connectivity and security issues as well. Surely there must be a better way!
Monitoring Solutions with Apache Ambari and Viewpoint
Luckily Apache Ambari addresses all of these challenges and concerns by providing a collection of RESTful APIs from which a plethora of Hadoop monitoring data can be obtained, translated into easy-to-understand metrics and presented in a fashion that is already familiar to our users. There is no learning curve or additional training needed. Ambari handles the work of collecting the monitoring data from a variety of the monitoring technologies mentioned above. It then aggregates this data and provides a series of RESTful APIs. These APIs can all be accessed by making web service calls against a central node in the Hadoop cluster. All data is provided in JSON format so it can easily be parsed by just about any programming language.
Teradata Viewpoint follows standard data collection practices to collect data from Apache Amabari and stores in the Viewpoint database, which is scheduled for regular backups. The data is collected from Ambari every minute by default, and therefore the database has a view of the state of the Hadoop system over the course of an hour, day, or week. This historical data is used to generate a variety of different charts in the Viewpoint web portal, and also to enable the use of Rewind to enable users to go back and see exactly what was occurring on the Hadoop cluster at a specific point in time. This enables highly efficient troubleshooting when issues actually occur or to predict the load on a particular system in future.
By leveraging Apache Ambari’s capabilities, Teradata Viewpoint delivers a comprehensive monitoring solution for ALL the systems in your enterprise data architecture, including Multiple Hadoop Clusters. Teradata’s Java and Web developers are focused on the tasks at which they excel: Getting the data from the source system (Ambari), transforming them into easy-to-understand metrics, and displaying it in Viewpoint’s “Portlets”. No time was wasted trying to get up to speed on Ganglia, JMX, or many of the details of Hadoop’s inner workings. Ambari was a critical piece of technology to help Viewpoint roll out this solution and enhances Teradata’s Unified Data Architecture.
Teradata Viewpoint offers the following benefits to enterprise users:
- Single Pane of Glass – to monitor and manage ALL systems in your architecture, including multiple Hadoop clusters. There is no need for multiple systems for monitor/manage purposes.
- Completely Customizable User Interface – Viewpoint UI can be configured with multiple portlets or dashboards to get a snapshot of health and capacity of ALL systems at once or view deeper metrics on a single system.
- Integration into Existing Tools – No additional set-up or installation of packages is needed. Viewpoint leverages in-built capabilities in Apache Ambari to monitor Hadoop and other existing Teradata systems, allowing getting more ROI from existing enterprise tools.
- Scalable System – Teradata Viewpoint is designed to scale according to customer’s needs in the data architecture. Viewpoint brings years of maturity in monitoring large scale systems at our customers.
- Configurable Alerts – User can completely configure the thresholds for multiple alerts levels. Notifications can be sent via emails or any other existing mechanisms.
- Historical data – Viewpoint lets you rewind to metrics to a particular point of time in the past to ease troubleshooting and better predict the future capacity needs.
- Web Browser – viewpoint is a intuitive and easy-to-use browser based application, with minimal load on the systems that are being monitored
Following are some of the dashboards or portlets, developed specifically for Hadoop. For information on additional dashboards, please refer to the links at the bottom of this blog.
- Hadoop Services: Displays summary information for all Hadoop services, typical user is a Hadoop admin or a central DBA
- System Health: KPI indicator of system performance and system state for Hadoop and/or other systems, typical user is a Hadoop admin or a central DBA
- Alert Viewer: Monitor, configure and Manage logged Alerts for Hadoop systems, typical user is a Hadoop admin or a central DBA
- Node Monitor: Displays summary information about the nodes on a Hadoop system, typical user is a Hadoop admin or a central DBA
- Space Usage: Monitoring and managing disk space usage (Perm/Temp/Spool), typical user is a Hadoop admin or a central DBA
- Metrics Analysis View several systems and metrics over a period of time for trending analysis, typical user is a Hadoop admin or a central DBA
- Metrics Graph Graphical representation of system metrics, typical user is a Hadoop admin or a central DBA or a manager.
- Capacity Heatmap Interactive visualization tool for analyzing hotspots of various system metrics over user definable time periods, typical user is a Hadoop admin or a central DBA or a manager.