Today our guest blogger is Keith Manthey, CTO from EMC.
As part of my job, I regularly meet with clients around their Apache Hadoop journey. I often meet executives after they have encountered a catalytic event. In one particular meeting I vividly remember, the client had suffered over 24 hours of downtime on their Hadoop cluster. The second question out of their mouths was how were they able to provide four 9s availability (or 99.99%) uptime to their Hadoop environment. This conversation track usually leads to a very interesting discussion around the operational sustainability of their analytics platforms in which Hadoop is their bedrock.
Last week, I was very excited to read the announcement of general availability for Hortonworks Data Platform 2.4 that includes community innovations of
Apache Ambari 2.2, as well as Apache Spark 1.6. As I read the new innovations in HDP 2.4, I was struck with how it continues to bring features required for IT Operational Success in Enterprises. I usually refer to the features most IT and business executives are looking for in their Analytics Platforms as enterprise-ready Hadoop. While the definition for enterprise-ready Hadoop is loosely translated and will vary from executive to executive, there is a core set of features that define operational success in platforms. Uptime and Availability as well as Operational Sustainability are features most business and IT executives want for their platforms.
Hortonworks is a valued partner of EMC. In addition to EMC’s ability to re-sell Hortonworks’ licenses, there is a joint engineering relationship between the companies. As both companies work with their respective client bases, the need for enterprise-ready Hadoop is well understood. It is with this common goal for enterprise-ready Hadoop that I welcome the features in HDP 2.4.
Hadoop is the newest entrant in most IT arsenals for how companies accomplish their goals. Most companies continue to want Hadoop to adopt the operational features of their more mature systems as more business dependencies are built onto their Hadoop platforms.
One key message I hear regularly and is often echoed by the media is that Hadoop is too cumbersome to install and manage. Ambari improvements aimed at making the installation of Hadoop more seamless goes along way. The features such as blue prints for installation, express upgrades, and deeper integration with Apache Ranger for ease of install are three features in Ambari 2.2 that will help smooth out the installation process. The goal of these features is to make the install less cumbersome and more repeatable for those early in their Hadoop journey.
Another key theme that regularly is discussed in my meetings with executives is that the pace of change for Hadoop is unsustainable. This isn’t to be confused with the addition of rich features into the ecosystem. Clients have a difficult time ingesting new Hadoop distributions published every 4 months or with some similar frequency. The process to patch and certify the environments becomes more onerous and leaves them less time for continuing operations. HDP 2.4 brings a great balance to that equation that will help organizations as they look to build operational sustainability. The new cadence of HDP releases includes the more stable “core” services that conforms to the ODPi specifications in a time stabilized release pattern allows clients to have less velocity in their core Hadoop infrastructure. Clients are able to upgrade with less frequency and have more stability in their environments. Then the “extended” services releases where the most frequent velocity projects like Apache Spark reside will allow them to augment their certifications.
A few weeks ago, I was in a client who runs Hortonworks as their distribution. The environment is a great entry footprint where they are building some very relevant use cases. The environment is still very much a batch environment. The client has a great vision on how they want to grow the batch environment into a much more significant environment in terms of size. The goal would also be to one-day approach a more streaming environment, but they are batch workloads for the foreseeable timeframe. This client is a large power company that takes sensor data from power meters and uses the sensor readings to calculate transformer failure conditions that are likely to happen given the energy flow on the meters. Uptime is one of their primary concerns for this environment. The topic of Apache Spark’s velocity on releases as compared to the rest of their Hadoop stack was a lively discussion. From HDP 2.4 on, this client can now tailor their release and certification processes to meet their business needs. The client can maintain their “core” services of HDFS and YARN at a regular ODPi cadence and their “extended” services with Apache Spark as they see fit.
I mentioned earlier about Hadoop becoming a core of the Modern Data Analytics platform. Businesses are putting dependencies on the Analytics platforms to deliver outcomes. In the case of power generation companies, there are a few very compelling outcomes of data and analytics on Hadoop. The “loss” of power for billing purposes is a very common scenario. “Loss” is often attributed to malfunctioning meters and equipment, but in some distinct cases it is outright theft. While this doesn’t change the amount of power generated or consumed, it does allow for proper accounting and revenue collection of all power used.
Another powerful use case is the logistics analysis of the power grid. By carefully monitoring peaks and valleys of power usage, patterns around transformer life expectancy and load capacity can be drawn. These patterns can help enable prescriptive behavior that can replace components in advance of failure or re-route energy traffic flow to prevent future outages. Given these very compelling use cases, it is no surprise that companies would want some solid operational sustainability and support wrapped around their analytics platforms. Hadoop is quickly becoming a core part of the infrastructure that businesses need to deliver the above-mentioned outcomes. This core infrastructure role is what is driving the need for enterprise-ready Hadoop.
In summary, the new HDP 2.4 release bring a level of maturity to the Hadoop ecosystem that are enabling clients to build enterprise-ready practices for Analytics around their Hadoop environments. As EMC, we welcome these changes and view them as very beneficial as we help our clients build enterprise-ready Hadoop environments.
For more information, please go to:
About the Author:
Keith is the CTO of Analytics for EMC’s Emerging Technology Division. He joined EMC in April of 2015, bringing more then 24 years of Identity Fraud Analytics experience, alternative and traditional data architectures experience, and Financial Systems’ Data and Analytics experience.