This article is the second installment in a three part series that covers one of the most critical issues facing the financial industry – Trade Surveillance. While the first (and previous) post discussed the global scope of the problem across multiple global jurisdictions – this post will discuss a candidate Big Data & Cloud Computing Architecture that can help market participants & Banks implement these capabilities in their applications & platforms.
The first article in this three part series laid out the five business trends that are causing a need to rethink existing Global & Cross Asset Surveillance based systems.
To recap them below –
The key issue becomes – how do antiquated surveillance systems move into the era of Cloud & Big Data enabled innovation as a way of overcoming these business challenges?
Technology Requirements –
An intelligent surveillance system needs to store trade data, reference data, order data, and market data, as well as all of the relevant communications from all the disparate systems, both internally and externally, and then match these things appropriately. The system needs to account for multiple levels of detection capabilities starting with a) configuring business rules (that describe a fraud pattern) as well as b) dynamic capabilities based on machine learning models (typically thought of as being more predictive). Such a system also needs to parallelize execution at scale to be able to meet demanding latency requirements for a market surveillance platform.
The most important technical essentials for such a system are –
Application & Data Architecture –
The dramatic technology advances in Big Data & Cloud Computing enable the realization of the above requirements. Big Data is dramatically changing that approach with advanced analytic solutions that are powerful and fast enough to detect fraud in real time but also build models based on historical data (and deep learning) to proactively identify risks.
To enumerate the various advantages of using Big Data –
a) Real time insights – Generate insights at a latency of a few milliseconds
b) A Single View of Customer/Trade/Transaction
c) Loosely coupled yet Cloud Ready Architecture
d) Highly Scalable yet Cost effective
The technology reasons why Hadoop is emerging as the best choice for fraud detection: From a component perspective Hadoop supports multiple ways of running models and algorithms that are used to find patterns of fraud and anomalies in the data to predict customer behavior. Examples include Bayesian filters, Clustering, Regression Analysis, Neural Networks etc. Data Scientists & Business Analysts have a choice of MapReduce, Spark (via Java,Python,R), Storm etc and SAS to name a few – to create these models. Fraud model development, testing and deployment on fresh & historical data become very straightforward to implement on Hadoop. The last few releases of enterprise Hadoop distributions (e.g. Hortonworks Data Platform) have seen huge advances from a Governance, Security and Monitoring perspective.
A shared data repository called a Data Lake is created, that can capture every order creation, modification, cancelation and ultimate execution across all exchanges. This lake provides more visibility into all data related to intra-day trading activities. The trading risk group accesses this shared data lake to processes more position, execution and balance data. This analysis can be performed on fresh data from the current workday or on historical data, and it is available for at least five years—much longer than before. Moreover, Hadoop enables ingest of data from recent acquisitions despite disparate data definitions and infrastructures. All the data that pertains to trade decisions and trade lifecycle needs to be made resident in a general enterprise storage pool that is run on the HDFS (Hadoop Distributed Filesystem) or similar Cloud based filesystem. This repository is augmented by incremental feeds with intra-day trading activity data that will be streamed in using technologies like Sqoop, Kafka and Storm.
The above business requirements can be accomplished leveraging the many different technology paradigms in the Hadoop Data Platform. These include technologies such as enterprise grade message broker – Kafka, in-memory data processing via Spark & Storm etc.
Illustration : Candidate Architecture for a Market Surveillance Platform
The final post will cover this architecture and also discuss forward looking approaches from a business & technology standpoint.