Giving our users more tools to help diagnose and troubleshoot complex issues
Here on the Cloud & Operations team, our goal is to make HDP easier to install, manage, and monitor, and over the last four years we have made a lot of progress to help improve the Hadoop Operator’s day-to-day experience. We’ve made critical investments in new Apache projects, made strategic acquisitions, and created net-new services to achieve that goal.
Apache Ambari was created to be the one stop shop for installation, management and monitoring of Hadoop. Recognizing that customers needed help managing Hadoop in cloud environments, we acquired SequenceIQ and delivered Cloudbreak to extend our operational reach into cloud providers such as Amazon Web Services, Microsoft Azure, and Google Cloud Platform. With an ever-growing list of Apache components being brought into HDP and a rapidly expanding customer base with diverse workloads, performance objectives, and use cases, we realized more intelligence is needed to understand how customers are using the platform and ensure their cluster configuration keeps pace with that use. For that, we created Hortonworks SmartSense. SmartSense analyzes how the cluster is configured, how it’s being used, and produces custom recommendations to help with performance, security and operational improvements.
With these investments, we have enabled customers to more easily install, secure and configure Apache Hadoop on-premise and in the cloud, all while providing the most up to date set of best practices and proactive cluster monitoring.
But…what happens when something goes wrong? With large-scale distributed systems, problems are inevitable, and it’s the tooling that’s provided that makes all the difference. Having tools that help you quickly identify where the issue exists, what is wrong, and how it can be resolved is what makes the difference between an issue that can be quickly fixed by operators themselves, or an issue that requires working with Hortonworks Support.
It’s with that context that we’re rolling out a new initiative to continue streamlining operations by helping customers diagnose and troubleshoot complex issues. There are three (3) phases planned for these improvements, the first being delivered with the upcoming Ambari 2.2 maintenance release.
Phase 1: Advanced Metrics Visualization & Dashboarding
Apache Hadoop components produce a lot of metric data, and the Ambari Metrics System provides a scalable low-latency storage system for those metrics. Understanding which specific metrics to look at for each of the core Hadoop components takes experience and understanding of how the components work themselves and with each other. To help simplify the process of reviewing metrics, and to be more prescriptive about which metrics to look at, we’re including Grafana with Ambari Metrics as a part of Ambari.
Grafana will be deployed, managed, and pre-configured to work with the Ambari Metrics service. We are including a curated set dashboards for core HDP components, giving operators at-a-glance views of the same metrics we consume when helping customers troubleshoot complex issues.
The metrics displayed on the dashboard can be filtered by time, component, and contextual information (for example, like YARN queues).
Phase 2: Consolidated Cluster Activity Reporting
Hortonworks customers have come to rely on HDP for their mission critical, multi-tenant workloads. With this comes the need to answer questions about how tenants are using the cluster, how jobs are performing, which queues are most active, which users are using the most resources, etc. By combining the analytical capabilities of SmartSense, with the interactive reporting capabilities of Apache Zeppelin, Hortonworks is providing consolidating utilization data reporting across core HDP components in a single dynamic UI.
SmartSense will mine, consolidate, and store utilization data and provide a fully managed Apache Zeppelin instance with pre-built notebooks to analyze, query, and report off of that dataset. Existing notebooks can be extended, and custom notebooks can be created to support customer-specific reporting needs.
This solution allows customers to easily and quickly answer questions like:
Phase 3: Centralized & Contextual Log Search
Apache Hadoop components create a lot of log data. Accessing that log data to understand what the component is telling you, especially when issues arise, is critical. A new component of Apache Ambari has been created to collect, parse, and index these logs using Apache Solr. Apache Solr provides a scalable, low-latency search solution that will power a new custom user interface that operators will use to access and search component logs across the cluster.
Apache Ambari will include and manage a new Log Search component that provides agents for log collection, Apache Solr for log indexing, and a custom UI for searching those logs. These components are essential to providing a streamlined approach to search for stack traces, exceptions, block ID’s, and other information that needs to be seen across all nodes in the cluster.
With this combined set of efforts, we’re excited to help customers streamline operations for HDP, bring new components to the table, and improve the day-to-day experience for our operators in the moments that matter most, when something goes wrong. We look forward to updating you on this journey and look forward to our customers using the first phase in the upcoming release of Ambari.