This is the seventh in our series on modern data architectures across industry verticals. Others in the series are:
Any financial services business cares about minimizing risk and maximizing opportunity. Banks weigh the risk of opening accounts versus the opportunity to hold deposits. Insurance companies balance the risk of paying claims with the opportunity to take premiums. Investment companies pursue long-term portfolio appreciation knowing that some securities will lose value.
Regulatory risk is present in all of these businesses and there is always internal risk. A few rogue individuals can cause extraordinary losses if their malicious activities go unnoticed.
Banks, insurance companies and securities firms that store and process huge amounts of data in Apache Hadoop have better insight into both their risks and opportunities. Deeper analysis and insight can improve operational margins and protect against one-time events that might cause catastrophic losses.
The following reference architecture diagram represents a combination of approaches that we see our financial customers adopt in their banking, insurance and investment businesses.
Here are some use cases that describe specific ways that financial services companies use Apache Hadoop to make more money for customers and shareholders.
Every day, large retail banks take thousands of applications for new checking and savings accounts. Bankers that accept these applications consult 3rd party risk scoring services before opening an account. They can (and do) override do-not-open recommendations for applicants with poor banking histories.
Many of these high-risk accounts overdraw and charge-off due to mismanagement or fraud, costing banks millions of dollars in losses (and some of this cost is passed on to customers who responsibly manage their accounts).
Apache Hadoop can store and analyze multiple data streams and help regional bank managers control new account risk in their branches. They can match banker decisions with the risk information presented at the time of decision. This allows them to control risk by sanctioning individuals, updating policies, and identifying patterns of fraud.
Over time, the accumulated data informs algorithms that may detect subtle, high-risk behavior patterns unseen by the bank’s risk analysts.
Banks possess massive amounts of operational, transactional and balance data that holds information about macro-economic trends. This information can be valuable for investors and policy-makers outside of the banks, but regulations and internal policies require that these uses strictly protect the anonymity of bank customers.
Retail banks have turned to Apache Hadoop as a common cross-company data lake for data from different LOBs: mortgage, consumer banking, personal credit, wholesale and treasury banking. Both internal managers and consumers in the secondary market derive value from the data. A single point of data management allows the bank to operationalize security and privacy measures such as de-identification, masking, encryption, and user authentication.
Traditional auto-insurance attempts to differentiate and reward “safe” drivers for their historical driving records—the accidents and traffic infractions that have (or have not) already happened.
Newer usage-based insurance (also called Pay as You Drive, or PAYD) attempts to align premiums with empirical risk, based on how policyholders actually drive.
Safer drivers pay less, because the insurance company actually knows how they actually drive. Because policyholders know this, PAYD insurance promotes a virtuous cycle that improves overall safety and reduces moral hazard amongst drivers who take more risk on the road because they know that they’re covered.
Advances in GPS and telemetry technologies have reduced the cost of capturing the driving data used to price PAYD policies, but the data streaming from vehicles grows very quickly, and it needs to be stored for analysis.
A major insurer was storing its PAYD data on an RDBMS platform, but storage costs were too high, so the company only retained 25% of the available data. Processing that subset of data took one working week.
After adopting HDP, the company retains 100% of policyholders’ PAYD geo-location data and processes that quadrupled data stream in three days or less. More data. Faster processing. Hadoop.
One Hortonworks customer is a global property and casualty insurer that already had systems in place for analyzing structured data at scale. Less-structured claims notes or social media analysis was used on a claim-by-claim basis, but it did not scale easily. Combining all textual or social data with all structured data was not economically viable.
Apache Hadoop changed that. It is a “schema on read” system that permits ingest of a much wider range of data types. Data puddles that were previously scattered about are now unified in a data lake, for a much clearer and holistic picture of actual risk.
This profound data reservoir can still be analyzed using existing business intelligence tools and employee skills, thanks to close integration between HDP and Hortonworks partners SAS, Tableau and QlikView.
Ticker plants collect and process massive data streams, displaying prices for traders and feeding computerized trading systems fast enough to capture opportunities in seconds. This is useful for making real-time decisions, and years of historical market data can also be stored for long-term analysis of market trends.
One Hortonworks customer re-architected their ticker plant with HDP as its cornerstone. Before Hadoop, the ticker plant was unable to hold more than ten years of trading data. Now every day gigabytes of data flow in from thousands of server log feeds. This data is queried more than thirty thousand times per second, and Apache HBase enables super-fast queries that meet the client’s SLA targets. All of this, and also a retention horizon extended beyond ten years.
Another Hortonworks customer that provides investment services processes fifteen million transactions and three hundred thousand trades every day. Because of storage limitations, the company used to archive historical trading data, which limited that data’s availability. In the near term, each day’s trading data was not available for risk analysis until after close of business. This created a window of time with unacceptable risk exposure to money laundering or rogue trading.
Now the Hortonworks Data Platform accelerates the firm’s speed-to-analytics and also extends its data retention timeline. A shared data repository across multiple LOBs provides more visibility into all trading activities. The trading risk group accesses this shared data lake to process more position, execution and balance data. They can do this analysis on data from the current workday, and it is highly available for at least five years—much longer than before.
Watch our blog in the coming weeks for reference architectures in other industry verticals.