Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
March 11, 2015
prev slideNext slide

HDP for Insurance: Common Data Challenges

Changes in technology and customer expectations create new challenges for how insurers engage their customers, manage risk information and control the rising frequency and severity of claims.

Carriers need to rethink traditional models for customer engagement. Advances in technology and the adoption of retail engagement models drive fundamental changes in how customers shop for and purchase insurance coverage. To engage with their customers, our insurance customers seek “omni-channel” insight and the ability to confidently recommend the next best action (NBA) to their customers. These new models of engagement require a single view of their customer relationships across multiple channels and data repositories.

All data is growing quickly, and the same is certainly true for risk and underwriting information. The cost to generate and capture sensor data has fallen dramatically. More sensor data enables carriers to price innovative usage-based insurance (UBI) products for “connected car” and “connected home” with empirical data on the use and condition of the covered assets. Big data for UBI reduces moral hazard: safer customers pay lower premiums. But to take advantage of all the sensor data, insurers need to integrate it with their processes for predictive analytics.

A third group of challenges facing the insurance industry is the rising frequency and severity of claims. Technological advances give perpetrators of fraud and dishonest individuals new tools to game the system. New organic risks such as changing weather patterns also change the risk landscape. These factors drive up the frequency and severity of claims, jeopardizing profits for the carriers. In response, companies are enhancing their special investigation unit (SIU) capabilities with enhanced predictive models and natural language analysis of claims notes. Carriers want to analyze newer sources of unstructured information to strengthen subrogation opportunities, stop fraud and minimize claims leakage. This requires new capabilities for data discovery.

Our customers in telecom, retail, healthcare, manufacturing, and financial services partner with Hortonworks for the same reason: to build advanced analytic applications for a single view of their business, predictive modeling and data discovery. We bring these and our direct insurance experiences to each additional carrier who subscribes with Hortonworks.

Challenges Building a Single View of the Customer – Fragmented Data

Insurance is a service business, based on accurate risk assessment. HDP powers a single view of your customers, which helps you provide personalized customer experiences. A modern data architecture with HDP provides this single view and helps you acquire new customers that meet your risk appetite, then grow existing relationships and retain valuable customers.

Why has this single view of customers been challenging with legacy data platforms—despite the hundreds of millions of dollars already invested? The main challenge has to do with data fragmentation within most enterprises–whether they are a multi-national, multi-line or a regional, specialty-line insurance carrier.

Different business units invested their distinct IT budgets to create their own data storage islands, holding data specific to their functions or their product lines, that could only be accessed by employees within their groups. Even within a group, access was limited to a select few who had the technical expertise or authorization to get the data—all others needed to line up for data delivery by those select few.

The challenges posed by this fragmentation are particularly steep for carriers that sell through independent agents. For example, agents selling product for one of our insurance customers used to build their own single view of the customer (by pulling data from six different systems) before they even picked up the phone. Obviously, this slowed productivity and hurt sales.

Challenges with Predictive Modeling – Too Little Data

While carriers use a single view of the customer to improve customer service, more and different data has the potential to dramatically improve our underwriting processes with predictive modeling.

For the entire history of our industry, we have used forensic evidence on events or losses that have already occurred to predict the likelihood of similar losses in the future. Sam L. Savage described the inherent risk and limitations of this approach in his book, “The Flaw of Averages” which describes “…a series of consistent mistakes that we make by plugging single numbers into a model where there’s really an uncertainty.” To be fair, most insurance models use more than “a single number”, but the point Savage makes is about generalizing data when specifics are important.

The classic example of this concept describes a statistician who’s told that the average depth of a river is three feet. “No problem,” he thinks, “three feet’s not too deep.” And then…

Source: Jeff Danziger, San Jose Mercury News, October 2000
Source: Jeff Danziger, San Jose Mercury News, October 2000

The flaw of averages has given insurance carriers an overly simple map of the risk they cover—driven primarily by high data acquisition and storage costs. Limited data visibility means that carriers insure fewer customers than they could if they had more, better, less expensive data.

Apache Hadoop makes that possible. We see our insurance customers use HDP to draw pictures of their “rivers of risk” that look more like this:


More and different data stored in HDP can replace incomplete sample data with abundant, accessible empirical data. Luminar, a Hortonworks subscriber in the advertising industry, overcame these same limitations with sample data for predicting likelihood to buy. Here’s how Franklin Rios, President of Luminar, sums it up:

Let’s get rid of the days of sample data. Let’s have all the data. Let’s bring all the data from multiple sources.”

Challenges with Data Discovery – New Types of Data

Another structural challenge in our industry has been that our data storage platforms could only store structured, curated data. This “clean” data did not fully describe messy real-world phenomena like car accidents, fires, floods, or premature deaths. Data scientists and actuaries could build world-class models, but such models were only ever as good as the data fed into them. Legacy data architectures are not optimized to handle newer types of data with variable data structures—at least without first passing them through extract-transform-load processes to fit them into a columnar database.

Now enterprise Hadoop makes it possible to capture and store vast amounts of new types of data without first transforming them. By exploring clickstream, sensor, social, location, voice, text, video or server log data—alone, or in combination with existing legacy datasets—we find new and unexpected relationships. Before, our hypotheses determined the data we captured. Now, we can capture far more data in its raw form and then let that data suggest more and better hypotheses.

Here’s the takeaway: HDP’s schema-on-read architecture makes it more expensive to leave data behind than it is to capture and store all the new types of data for further data discovery.

From Challenges to Solutions

Although insurance carriers face the same three general challenges as they build advanced analytic applications for their big data assets in Hadoop, we’ve seen our customers meet those challenges with advanced analytic applications that provide a single view of customers, more predictive power or advanced analytics through improved data discovery.

In my next post in the series, I’ll discuss some specific HDP solutions for common insurance use cases and describe how those have changed our customers’ insurance businesses.

Learn More About HDP for Insurance

About the Author

Cindy Maike is GM of Insurance at Hortonworks, and is responsible for the center of excellence for insurance and the go-to-market strategy for the industry. She has over 25 years of finance, consulting and advisory services experience in the insurance industry assisting clients globally with their business and IT strategy with a specific focus on the business strategy and the usage of analytics to drive results.


Leave a Reply

Your email address will not be published. Required fields are marked *