This article is the second installment in a three part series that covers one of the most critical issues facing the financial industry – Trade Surveillance. While the first (and previous) post discussed the global scope of the problem across multiple global jurisdictions – this post will discuss a candidate Big Data & Cloud Computing Architecture that can help market participants & Banks implement these capabilities in their applications & platforms.
Business Background –
The first article in this three part series laid out the five business trends that are causing a need to rethink existing Global & Cross Asset Surveillance based systems.
To recap them below –
- The rise of trade lifecycle automation across the Capital Markets value chain and the increasing use of technology across the lifecycle contributes to an environment where speeds and feeds are contributing to a huge number of securities changing hands (in huge quantities) in milliseconds across 25+ global venues of trading; automation leads to increase in trading volumes which adds substantially to the increased risk of fraud
- The presence of multiple avenues of trading (ATF – alternative trading facilities and MTF – multilateral trading facilities) creates opportunities for information and price arbitrage that were never a huge problem before in terms of multiple markets and multiple products across multiple geographies with different regulatory requirements.
- As a natural consequence of all of the above – (the globalization of trading where market participants are spread across multiple geographies) it makes it all the more difficult to provide a consolidated audit report to view all activity under a single source of truth ;as well as traceability of orders across those venues; this is extremely key as fraud is becoming increasingly sophisticated e.g the rise of insider trading rings
- Existing application (e.g ticker plants, surveillance systems, DevOps) architectures are becoming brittle and underperforming as data and transaction volumes continue to go up & data storage requirements keep rising every year. This leads to massive gaps in compliance data. Another significant gap is found while performing a range of post trade analytics – many of which are beyond the simple business rules being leveraged right now and now increasingly need to move into the machine learning & predictive domain. Surveillance now needs to include non traditional sources of data e.g trader email/chat/link analysis etc that can point to under the radar rogue trading activity before that causes the financial system huge losses. E.g. the London Whale, the LIBOR fixing scandal etc
- Again as a consequence of increased automation, backtesting of data has become a challenge – as well as being able to replay data across historical intervals. This is key in mining for patterns of suspicious activity like bursty spikes in trading as well as certain patterns that could indicate illegal insider selling
The key issue becomes – how do antiquated surveillance systems move into the era of Cloud & Big Data enabled innovation as a way of overcoming these business challenges?
Technology Requirements –
An intelligent surveillance system needs to store trade data, reference data, order data, and market data, as well as all of the relevant communications from all the disparate systems, both internally and externally, and then match these things appropriately. The system needs to account for multiple levels of detection capabilities starting with a) configuring business rules (that describe a fraud pattern) as well as b) dynamic capabilities based on machine learning models (typically thought of as being more predictive). Such a system also needs to parallelize execution at scale to be able to meet demanding latency requirements for a market surveillance platform.
The most important technical essentials for such a system are –
- Support end to end monitoring across a variety of financial instruments across multiple front office systems
- Provide a platform that can ingest from tens of millions to billions of market events (spanning a range of financial instruments – Equities, Bonds, Forex, Commodities and Derivatives etc) on a daily basis from tens of systems
- The ability to add new business rules (via either a business rules engine and/or a model based system that supports machine learning) is a key requirement.
- Provide advanced visualization techniques thus helping compliance and surveillance officers manage the information overload.
- The ability to perform deep cross-market analysis i.e. to be able to look at financial instruments & securities trading on multiple geographies and exchanges
- The ability to create views and correlate data that are both wide and deep. A wide view will look at related securities across multiple venues; a deep view will look for a range of illegal behaviors that threaten market integrity such as market manipulation, insider trading, watch/restricted list trading and unusual pricing.
- The ability to provide in-memory caches of data for rapid pre-trade compliance checks.
- Ability to create prebuilt analytical models and algorithms that pertain to trading strategy (pre- trade models –. e.g. best execution and analysis). The most popular way to link R and Hadoop is to use HDFS as the long-term store for all data, and use MapReduce jobs (potentially submitted from Hive or Pig) to encode, enrich, and sample data sets from HDFS into R.
- Provide Data Scientists and Quants with development interfaces using tools like SAS and R.
- The results of the processing and queries need to be exported in various data formats, a simple CSV/txt format or more optimized binary formats, JSON formats, or even into custom formats. The results will be in the form of standard relational DB data types (e.g. String, Date, Numeric, Boolean).
- Based on back testing and simulation, analysts should be able to tweak the model and also allow subscribers (typically compliance personnel) of the platform to customize their execution models.
- A wide range of Analytical tools need to be integrated that allow the best dashboards and visualizations.
Application & Data Architecture –
The dramatic technology advances in Big Data & Cloud Computing enable the realization of the above requirements. Big Data is dramatically changing that approach with advanced analytic solutions that are powerful and fast enough to detect fraud in real time but also build models based on historical data (and deep learning) to proactively identify risks.
To enumerate the various advantages of using Big Data –
a) Real time insights – Generate insights at a latency of a few milliseconds
b) A Single View of Customer/Trade/Transaction
c) Loosely coupled yet Cloud Ready Architecture
d) Highly Scalable yet Cost effective
The technology reasons why Hadoop is emerging as the best choice for fraud detection: From a component perspective Hadoop supports multiple ways of running models and algorithms that are used to find patterns of fraud and anomalies in the data to predict customer behavior. Examples include Bayesian filters, Clustering, Regression Analysis, Neural Networks etc. Data Scientists & Business Analysts have a choice of MapReduce, Spark (via Java,Python,R), Storm etc and SAS to name a few – to create these models. Fraud model development, testing and deployment on fresh & historical data become very straightforward to implement on Hadoop. The last few releases of enterprise Hadoop distributions (e.g. Hortonworks Data Platform) have seen huge advances from a Governance, Security and Monitoring perspective.
A shared data repository called a Data Lake is created, that can capture every order creation, modification, cancelation and ultimate execution across all exchanges. This lake provides more visibility into all data related to intra-day trading activities. The trading risk group accesses this shared data lake to processes more position, execution and balance data. This analysis can be performed on fresh data from the current workday or on historical data, and it is available for at least five years—much longer than before. Moreover, Hadoop enables ingest of data from recent acquisitions despite disparate data definitions and infrastructures. All the data that pertains to trade decisions and trade lifecycle needs to be made resident in a general enterprise storage pool that is run on the HDFS (Hadoop Distributed Filesystem) or similar Cloud based filesystem. This repository is augmented by incremental feeds with intra-day trading activity data that will be streamed in using technologies like Sqoop, Kafka and Storm.
The above business requirements can be accomplished leveraging the many different technology paradigms in the Hadoop Data Platform. These include technologies such as enterprise grade message broker – Kafka, in-memory data processing via Spark & Storm etc.
Illustration : Candidate Architecture for a Market Surveillance Platform
The final post will cover this architecture and also discuss forward looking approaches from a business & technology standpoint.