Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
March 14, 2017
prev slideNext slide

Big Data as a platform for EU Market Regulation….MAR, MIFID II et al..(2/3)

The first post ( in this three part series explored the evolution of capital markets regulation in the European financial markets over the last 15 years. We covered the important aspects of MAR (Market Abuse Regulation) and MiFid II. In this second blogpost, we will discuss the business and technology requirements that drive these implementations to an open data architecture.

Key Business & Technology Requirements for MiFid II and MAR Platforms..

The MAR and MiFiD II regulations have broad ramifications across a variety of key capital markets business functions across the front, mid and back offices. These include compliance, compensation policies, regulatory reporting, trade surveillance etc. However, the biggest obstacles are related to technology as we will examine below.

Some of the key business requirements that can be distilled from a perusal of the regulatory mandates include the below:

  • The Ability to efficiently store enormous amounts of heterogeneous trade data – Both MiFiD II and MAR mandate the need to perform trade monitoring & analysis on not just real time data but also historical data spanning a few years. Among others this will include data feeds from a range of business systems – trade data, valuation & position data, reference data, rates, market data, client data, front, middle & back office, data, voice, chat & other internal communications etc. To sum up, the ability to store a range of cross asset (almost all kinds of instruments), cross format (structured & unstructured including voice), cross venue (exchange, OTC etc) trading data with a higher degree of granularity – is key.
  • The Ability to perform Data Lineage and Auditing – Such stored data needs to be fully auditable for 5 years. This implies not just being able to store it but also putting in place capabilities in place to ensure  strict governance & audit trail capabilities.
  • Manage a huge volume increase in data storage requirements (5+ years) due to extensive Record keeping requirements
  • Ability to perform Realtime Surveillance & Monitoring of data – Once data is collected,  normalized & segmented, it will need to support realtime monitoring of data (around 5 seconds) to ensure that every trade can be tracked through it’s lifecycle. Detecting patterns that could perform surveillance for market abuse and monitor for best execution are key.
  • Ability to create Business Rules  – Core logic that deals with identifying some of the above trade patterns are created using business rules. Business Rules have been covered in various areas in the blog but they primarily work based on an IF..THEN..ELSE construct.
  • Machine Learning & Predictive Analytics – A variety of supervised ad unsupervised learning approaches can be used to perform extensive Behavioral modeling & Segmentation to discover transactions behavior with a view to identifying behavioral patterns of traders & any outlier behaviors that connote potential regulatory violations.
  • Provide a Single View of an Institutional Client- From the firm’s standpoint, it would be very useful to have a single view capability for clients that shows all of their positions across multiple desks, risk position, KYC score etc.


Logical Architecture of a Market Surveillance System..

The ability perform deep & multi level analysis of trade activity implies the capability of not only storing heterogeneous data for years in one place as well as the ability to perform forensic analytics (Rules & Machine Learning) in place at very low latency. Querying functionality ranging from interactive (SQL like) needs to be supported as well as an ability to perform deep forensics on the data via Data Science. Further, the ability to perform quick & effective investigation of suspicious trader behavior also requires compliance teams to access and visualize patterns of trade, drill into behavior to identify potential compliance violations. A Big Data platform is ideal for these complete range of requirements.

Key Design Requirements for a Market Surveillance System for MiFiD II and MAR

The most important technical features for such a system are –

  1. Support end to end monitoring across a variety of financial instruments across multiple venues of trading. Support a wide variety of analytics that enable the discovery of interrelationships between customers, traders & trades as the next major advance in surveillance technology. HDFS is the ideal storage repository of this data.
  2. Provide a platform that can ingest from tens of millions to billions of market events (spanning a range of financial instruments – Equities, Bonds, Forex, Commodities and Derivatives etc) on a daily basis from thousands of institutional market participants. Data can be ingested using a range of tools – Sqoop, Kafka, Flume, API etc
  3. The ability to add new business rules (via either a business rules engine and/or a model based system that supports machine learning) is a key requirement. As we can see from the above, market manipulation is an activity that seems to constantly push the boundaries in new and unforseen ways. This can be met using open source languages like Python and R. Multifaceted projects such as Apache Spark allow users to perform exploratory data analysis (EDA), data science based analysis using language bindings with Python & R etc for a range of investigate usecases.
  4. Provide advanced visualization techniques thus helping Compliance and Surveillance officers manage the information overload.
  5. The ability to perform deep cross-market analysis i.e. to be able to look at financial instruments & securities trading on multiple geographies and exchanges
  6. The ability to create views and correlate data that are both wide and deep. A wide view is one that helps look at related securities across multiple venues; a deep view will look for a range of illegal behaviors that threaten market integrity such as market manipulation, insider trading, watch/restricted list trading and unusual pricing.
  7. The ability to provide in-memory caches of data  for rapid pre-trade & post tradecompliance checks.
  8. Ability to create prebuilt analytical models and algorithms that pertain to trading strategy (pre- trade models –. e.g. best execution and analysis). The most popular way to link R and Hadoop is to use HDFS as the long-term store for all data, and use MapReduce jobs (potentially submitted from Hive or Pig) to encode, enrich, and sample data sets from HDFS into R.
  9. Provide Data Scientists and Quants with development interfaces using tools like SAS and R.
  10. The results of the processing and queries need to be exported in various data formats, a simple CSV/txt format or more optimized binary formats, JSON formats, or even into custom formats.  The results will be in the form of standard relational DB data types (e.g. String, Date, Numeric, Boolean).
  11. Based on back testing and simulation, analysts should be able to tweak the model and also allow subscribers (typically compliance personnel) of the platform to customize their execution models.
  12. A wide range of Analytical tools need to be integrated that allow the best dashboards and visualizations. This can be supported by platforms like Tableau, Qlikview and SAS.
  13. An intelligent surveillance system needs to store trade data, reference data, order data, and market data, as well as all of the relevant communication from a range of disparate systems, both internally and externally, and then match these things appropriately. The matching engine can be created using languages supported in Hadoop – Java, Scale, Python & R etc.
  14. Provide for multiple layers of detection capabilities starting with a) configuring business rules (that describe a trading pattern) as well as b) dynamic capabilities based on machine learning models (typically thought of as being more predictive). Such a system can also parallelize execution at scale to be able to meet demanding latency requirements for a market surveillance platform.

The next and final post will delve into the above logical architecture and will discuss the end to end flow from an open enterprise Hadoop design standpoint. We will use the Hortonworks Data Platform (HDP) and Hortonworks Data Flow (HDF) as candidate technologies for the implementation.

Call to Action..

The Hortonworks 100% open-source solution is at the heart of it all in financial services: Connected Data Platforms for data-at-rest and data-in-motion. Working together, Hortonworks Data Platform and Hortonworks DataFlow to provide our retail banking and capital markets customers a crucial competitive advantage in their dynamic, competitive industries.

To learn more about how financial services enterprises use Hortonworks solutions to analyze Big Data, visit





Leave a Reply

Your email address will not be published. Required fields are marked *