Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
March 23, 2017
prev slideNext slide

Spark Banking Analytics for Compliance with Fundamental Review of the Trading Book (FRTB)

This is a guest blog post by Paul Jones at IHS Markit. Headquartered in London, IHS Markit delivers next-generation information, analytics and solutions to customers in business, finance and government. Mr. Jones is the company’s Global Head of FRTB Solutions, and IHS Markit’s FRTB solution suite runs on Hortonworks Data Platform (HDP)—which includes Apache Spark™. The FRTB solution suite helps IHS Markit’s clients comply with new standards published by the Basel Committee on Banking Supervision on Minimum Capital Requirements for Market Risk. In this post, Mr. Jones explains the standards, known as the Fundamental Review of the Trading Book (FRTB) and also how banks can use Apache Spark to modernize their IT infrastructures for compliance with FRTB.

The introduction of the Basel Committee’s Fundamental Review of the Trading Book (FRTB) standards involves a comprehensive overhaul of banks’ market risk capital frameworks. The move from value-at-risk (VaR) to scaled expected shortfall (ES) will significantly increase the number and complexity of the capital calculations that banks need to undertake, as well as the sheer volume of data they must manage.

From a computation perspective, FRTB means that P&L vectors need to be generated per risk class, per liquidity horizon and per risk set. Removing the redundant permutations brings the total number of P&L runs to 63 (some of which can be done weekly), compared to two (VaR and Stress VaR) in the current approach.

That means that firms are faced with the challenge of performing about thirty times more FRTB capital calculations at scale while also managing their costs and risk. Banks’ current IT risk infrastructures are not up to the task ahead.

If banks want to achieve proactive and intraday risk management while also effectively managing their capital over the long-term, they will require high-performing IT infrastructures that can handle much more intensive calculations. However, many banks today rely on technologies such as relational databases and in-memory data grids (IMDGs) to conduct risk analytics, aggregation and capital calculations.

IMDGs work by replicating data or logging updates across machines. This requires copying large amounts of data over the cluster network, making them expensive to run for FRTB analytics.

In short, banks’ legacy IT architectures will need a significant overhaul when it comes to FRTB and firms are looking for alternative options. One of those options is Apache Spark, an open-source processing engine built around speed, in-memory processing, ease of use and sophisticated analytics.

Spark has a distributed programming model based on an in-memory data abstraction called Resilient Distributed Datasets (RDDs). RDDs are immutable, support coarse-grained transformations and keep track of which transformations have been applied to them. RDD immutability rules out a big set of potential problems due to updates from multiple threads at once and lineages that can be used for RDD reconstruction. As a result, check pointing requirements are low in Spark. This makes caching, sharing and replication easy. These are significant design wins, and there are other advantages over IMDGs too:

  • Memory optimisation: IMDGs require the entire working set in memory only and are limited to the physical memory available. Spark can spill to disk when portfolios do not fit into memory, making it far more scalable and resource efficient.
  • Efficient joins: IMDGs have fixed cubes and cannot do joins across datasets. Spark supports joining of multiple datasets natively. This allows more flexible reporting without the need for new cubes and additional memory. Joins are very performant in Spark.
  • Polyglot analytics: Spark supports custom aggregations and analytics which can be implemented in a variety of languages: Python, Scala, Java or R. IMDGs allow only limited SQL or OLAP expressions.
  • Multi-tenant support: Spark supports dynamic resource allocation, resource management, queues and quotas, allowing multiple users and processes to be supported on the same cluster. Some of these include: operations reporting, decision support, what-if and back testing.
  • Frugal hardware requirements: The immutable nature of RDDs enables Spark to scale and provide fault tolerance efficiently. A Spark cluster is highly available without the need for Active-Active hardware.

In fact, our own studies at IHS Markit have demonstrated many of these capabilities and showed the power of Spark in terms of performance, scalability and flexibility. For example, we recently completed a proof-of-concept with a European bank for our capital analytics and aggregation engine, FRTB Studio, which showed that the engine can support the capital charges for IMA and SA in single digit seconds based on a portfolio of one million trades with 9 million sensitivities, 18 million P&L vectors and on hardware costing just USD20,000.

As one of the most active projects in the Apache Software Foundation, Spark benefits from thousands of contributors continuously enhancing the platform. In fact, we’ve seen a 20% improvement in Spark aggregation performance year-on-year since we started building our solutions on the platform in 2016. We’re excited to see the improvements that are bound to come in the year ahead!

Leave a Reply

Your email address will not be published. Required fields are marked *