Financial Companies Do Hadoop
Any financial services business cares about minimizing risk and maximizing opportunity. Banks weigh the risk of opening accounts versus the opportunity to hold deposits. Insurance companies balance the risk of paying claims with the opportunity to take premiums. Investment companies pursue long-term portfolio appreciation knowing that some securities will lose value.
Regulatory risk is present in all of these businesses and there is always internal risk. A few rogue individuals can cause extraordinary losses if their malicious activities go unnoticed.
Banks, insurance companies and securities firms that store and process huge amounts of data in Apache Hadoop have better insight into both their risks and opportunities. Deeper analysis and insight can improve operational margins and protect against one-time events that might cause catastrophic losses.
The following reference architecture diagram represents a combination of approaches that we see our financial customers adopt in their banking, insurance and investment businesses.
Here are some use cases that describe specific ways that financial services companies use Apache Hadoop to make more money for customers and shareholders.
Screen New Account Applications for Risk of Default
Every day, large retail banks take thousands of applications for new checking and savings accounts. Bankers that accept these applications consult 3rd party risk scoring services before opening an account. They can (and do) override do-not-open recommendations for applicants with poor banking histories.
Many of these high-risk accounts overdraw and charge-off due to mismanagement or fraud, costing banks millions of dollars in losses (and some of this cost is passed on to customers who responsibly manage their accounts).
Apache Hadoop can store and analyze multiple data streams and help regional bank managers control new account risk in their branches. They can match banker decisions with the risk information presented at the time of decision. This allows them to control risk by sanctioning individuals, updating policies, and identifying patterns of fraud.
Over time, the accumulated data informs algorithms that may detect subtle, high-risk behavior patterns unseen by the bank’s risk analysts.
Monetize Anonymous Banking Data in Secondary Markets
Banks possess massive amounts of operational, transactional and balance data that holds information about macro-economic trends. This information can be valuable for investors and policy-makers outside of the banks, but regulations and internal policies require that these uses strictly protect the anonymity of bank customers.
Retail banks have turned to Apache Hadoop as a common cross-company data lake for data from different LOBs: mortgage, consumer banking, personal credit, wholesale and treasury banking. Both internal managers and consumers in the secondary market derive value from the data. A single point of data management allows the bank to operationalize security and privacy measures such as de-identification, masking, encryption, and user authentication.
Maintain Sub-Second SLAs with a Hadoop “Ticker Plant”
Ticker plants collect and process massive data streams, displaying prices for traders and feeding computerized trading systems fast enough to capture opportunities in seconds. This is useful for making real-time decisions, and years of historical market data can also be stored for long-term analysis of market trends.
One Hortonworks customer re-architected their ticker plant with HDP as its cornerstone. Before Hadoop, the ticker plant was unable to hold more than ten years of trading data. Now every day gigabytes of data flow in from thousands of server log feeds. This data is queried more than thirty thousand times per second, and Apache HBase enables super-fast queries that meet the client’s SLA targets. All of this, and also a retention horizon extended beyond ten years.
Analyze Trading Logs to Detect Money Laundering
Another Hortonworks customer that provides investment services processes fifteen million transactions and three hundred thousand trades every day. Because of storage limitations, the company used to archive historical trading data, which limited that data’s availability. In the near term, each day’s trading data was not available for risk analysis until after close of business. This created a window of time with unacceptable risk exposure to money laundering or rogue trading.
Now the Hortonworks Data Platform accelerates the firm’s speed-to-analytics and also extends its data retention timeline. A shared data repository across multiple LOBs provides more visibility into all trading activities. The trading risk group accesses this shared data lake to processes more position, execution and balance data. They can do this analysis on data from the current workday, and it is highly available for at least five years—much longer than before.