Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
November 30, 2016
prev slideNext slide

How Data Science and Predictive Analytics transform AML Compliance in Banking & Payments..(2/2)

The first blog in this two part series (Deter Financial Crime by Creating an effective AML Program) described how Money Laundering (ML) activities employed by nefarious actors (e.g drug cartels, corrupt public figures & terrorist organizations) have gotten more sophisticated over the years. Global and Regional Banks are falling short of their compliance goals despite huge technology and process investments. Banks that fail to maintain effective compliance are typically fined hundreds of millions of dollars. In this second & final post, we will examine why Big Data Analytics as a second generation effort can become critical to efforts to shut down the flow of illicit funds across the globe thus ensuring financial organizations are compliant with efforts to reduce money laundering.

Where current enterprise wide AML programs fall short..

Current AML programs fall short in five specific areas –

  1. Manual Data Collection & Risk Scoring – Bank’s response to AML statutes has been to bring in more staff typically in hundreds at large banks. These staff perform rote but key processes in AML such as Customer Due Diligence (CDD) and Know Your Customer (KYC).  These staff extensively scour external sources like Lexis Nexis, Thomson Reuters, D&B etc to manually scoring of risky client entities often pairing these with internal bank data. They also use AML watch-lists to perform this process of verifying individuals and business customers so that AML Case Managers can review it before filing Suspicious Activity Reports (SAR). On an average, about 50% of the cost of AML programs is incurred in terms of the large headcount requirements. At large Global Banks where the number of accounts are more 100 million customers the data volumes can get real big real quick causing all kinds of headaches for AML programs from a data aggregation, storage, processing and accuracy standpoint. There is a crying need to automate AML programs end to end to not only perform accurate risk scoring but also to keep costs down.
  2. Social Graph Analysis in areas such as Trade finance helps model the complex transactions occurring between thousands of entities. Each of these entities may have a complex holding structure with accounts that have been created using forged documents. Most fraud also happens in networks of fraud. An inability to dynamically understand the topology of the financial relationships among thousands of entities implies that AML programs need to develop graph based analysis capabilities .
  3. AML programs extensively deploy rule based systems or Transaction Monitoring Systems (TMS) which allow an expert system based approach to setup new rules. These rules span areas like monetary thresholds, specific patterns that connote money laundering & also business scenarios that may violate these patterns. However, fraudster rings now learn (or know) these rules quickly & change their fraudulent methods constantly to avoid detection. Thus there is a significant need to reduce a high degree of dependence on traditional TMS – which are slow to adapt to the dynamic nature of money laundering.
  4. The need to perform extensive Behavioral modeling & Customer Segmentation to discover transactions behavior with a view to identifying behavioral patterns of entities & outlier behaviors that connote potential laundering.
  5. Real time transaction monitoring in areas like Payment Cards presents unique challenges where money laundering is hidden within mountains of transaction data. Every piece of data produced as a result of bank operations needs to be commingled with historical data sets (for customers under suspicion) spanning years in making a judgment call about filing a SAR (Suspicious Activity Report).

How Big Data & Predictive Analytics can help across all these areas..


  1. The first area where Big Data & Predictive Analytics have a massive impact is around Due Diligence data of KYC (Know Your Customer) data. All of the above discussed data scraping from various sources can be easily automated by using tools in a Big Data stack to ingest information automatically. This is done by sending requests to data providers (the exact same ones that Banking institutions are currently using) via an API. Once this data is obtained, they can use real time processing tools (such as Apache Storm and Apache Spark) to apply sophisticated algorithms to that collected data to transform that data to calculate a Risk Score or Rating. In Trade Finance, Text Analytics can be used to process a range of documents like invoices, bills of lading, certificates of shipping etc to enable Banks to inspect a complex process across hundreds of entities operating across countries.  This approach enables Banks to process massive amounts of diverse data in quick time (even seconds) to synthesize it to accurate risk scores. Implementing Big Data in this very important workstream can help increase efficiency and reduce costs.
  2. The second area where Big Data shines at is in the space of helping create a Single View of a Customer as depicted below. This is made possible by doing advanced entity matching with the establishment and adoption of a lightweight entity ID service. This service will consist of entity assignment and batch reconciliation. The goal here is to get each business system to propagate the Entity ID back into their Core Banking, loan and payment systems, then transaction data will flow into the lake with this ID attached providing a way to do Customer 360.single-view-of-the-customer
  3. To be clear, we are advocating for a mix of both business rules and Data Science. Machine Learning is recommended as enables a range of business analytics across AML programs overcoming the limitations of a TMS. The first usecase is around Data Science for  – which is – Give me all transactions in one place, give me all the Case Mgmt files in one place, give me all of the customer data in one place and give me all External data (TBD) in one place. And the reason I want all of this is to perform Exploratory, hypothesis Data Science with the goal being to uncover areas of risk that one possibly missed out on before, find out areas that were not as risky as they thought were before so the risk score can be lowered and really constantly finding out the real Risk profile that your institution bears. E.g. Downgrading investment in your Trade financing as you are find a lot of Scrap Metal based fraudulent transactions.
  4. The other important value driver in deploying Data Science is to perform Advanced Transaction Monitoring Intelligence.  The core idea is to get years worth of Banking data in one location (the datalake) & then applying  unsupervised learning to glean patterns in those transactions. The goal is then to identify profiles of actors with the intent of feeding it into downstream surveillance & TM systems. This knowledge can then be used to –
  • Constantly learn transaction behavior for similar customers is very important in detecting laundering in areas like payment cards. It is very common to have retail businesses setup with the sole purpose of laundering money.
  • Discover transaction activity of trade finance customers with similar traits (types of businesses, nature of transfers, areas of operations etc.)
  • Segment customers by similar trasnaction behaviors
  • Understand common money laundering typologies and identify specific risks from a temporal and spatial/geographic standpoint
  • Improve and lear correlations between alert accuracy and suspicious activity reports (SAR) filings
  • Keep the noise level down by weeding out false positives

Benefits of a forward looking approach..  

We believe that we have a fresh approach that can help Banks with the following value drivers & metrics –

  • Detect AML violations on a proactive basis thus reducing the probability of massive fines
  • Save on staffing expenses for Customer Due Diligence (CDD)
  • Increase accurate production of suspicious activity reports (SAR)
  • Decrease the percent of corporate customers with AML-related account closures in the past year by customer risk level and reason – thus reducing loss of revenue
  • Decrease the overall KYC profile update backlog across geographies
  • Help create Customer 360 views that can help accelerate CLV (Customer Lifetime Value) as well as Customer Segmentation from a cross-sell/up-sell perspective

Big Data shines in all the above areas..


The AML landscape will rapidly change over the next few years to accommodate the business requirements highlighted above. Regulatory authorities should also lead the way in adopting a Hadoop/ ML/Predictive Analytics based approach over the next few years. There is no other way to do tackle large & medium AML programs in a lower cost and highly automated manner.



anais says:

test permute

Gayathri says:

This is one of the best Article i read after long time. AML concepts explained clearly

Hope Tutors

Kumar Raju Datla says:

Good Introduction around AML program and it has added some knowledge to my mind. Thanks for the effort.

Leave a Reply

Your email address will not be published. Required fields are marked *