Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
October 05, 2016
prev slideNext slide

The Tortoise and the Hare in Wall Street

The Financial regulators are driving a Data Evolution

Traditionally technology moves fast, regulators react slow. When technology leaps forward, it enables financial firms to change the nature of their business – often into un-regulated territory; Regulators react to pass regulation to catch up. This model can work in slow moving markets, but in todays interconnected fast global markets not reacting on time could mean another financial crisis.

I can imagine the moment in regulatory land when a light bulb “went on” and someone said: It is the data stupid! In today’s world everything is electronic, connected and leaves an electronic footprint behind; from emails, phone calls, orders to buy and sell, prices in the market, executions, everything leaves a trail! if the regulators have access to this data, they can not only perform surveillance on current regulations, but can also run analytics and pattern detection to determine if new nefarious ways of conducting business are arising that could disadvantage retail investors or put the financial markets at risk, in essence they can Future Proof Regulation. Brilliant!

So how do you do this? Easy, first require that Broker Dealers keep most of the data they generate, and second, have them give you all this data.  This is exactly what a series of regulations have done and it all comes down to the same end game: Capture and store lots of data, analyze this data, and do so both historically and in real-time.

The reactive Financial Markets regulators in the US are changing their model to a predictive regulatory model and this will have a deep impact on the fast moving financial industry. The industry will be contending with sweeping regulations, and will have to address them through the use of technology which enables the analysis of big data.

The regulations

The latest regulations which are driving these changes are: SEC 613 and BCBS 239. They each tackle a piece of a consistent technology foundation for the whole industry. Almost as if they had designed the ideal market regulation architecture, and break it up into multiple pieces of regulation.

SEC Rule 613 CAT, aims to create a single, comprehensive database that would enable regulators to efficiently track all trading activity in the U.S. equity and options market. According to SEC Chair Mary Jo White.  “It will significantly increase the ability of regulators to conduct research, reconstruct market events, monitor market behavior, and identify and investigate misconduct.”

BCBS 239, goes even further and it actually defines four major areas of implementation, with a total of 14 initiatives, all related to data. This regulation was created in reaction to the crash of 2007 which exposed the banks’ inadequate IT and Data architectures. Many institutions could not aggregate risk exposures and identify risk concentrations quickly, and this had severe consequences to the stability of the financial system as a whole.

Additionally, Financial firms must implement full Chain of Custody and Provenance for all the data they report. This means full knowledge where the data came from, who modified it along the way and how the each reported number is calculated.

A June 2015 report by McKinsey & Company “A marathon, not a sprint: Capturing value from BCBS 239 and beyond” states: “Risk and finance data and technology should become—and already have become in many institutions—key strategic board-level topics.” And concludes that “Most institutions agree with the view that BCBS 239 compliance is not the end, but rather the beginning of a continuous journey of enhancing Risk and Finance data aggregation and reporting.”

The SEC makes it very clear they will become a Data-driven regulator, and Banks must implement the proper Data infrastructure to comply with the new regulatory environment.

A sea of data

Now that you have to collect ever growing amounts of data, you need to store it and draw useful insights form the sea of electronic zeroes and ones. This is an extremely complex technical challenge, to store this large amount of data is difficult, and even harder to extract useful information in a timely manner. This ability had previously been extremely expensive, difficult and a rare skill set. However, across the continent, just south of where only decades before counterculture mavericks where putting flowers in their hair, a quiet revolution was taking place with far reaching repercussions across the rest of the world.

In Silicon Valley, the internet had grown so fast that the Volume of data being produced far outstripped any existing ability to process it, the Variety of data being produced defied structure and exploded to include images, videos, tweets, text messages, snap chat ephemeral videos, the variety was mind boggling, and finally, the Velocity at which the data was being produced dramatically accelerated, and you could not stop to catch up, it had to be ON all the time.

So what to do when faced with a new problem? You innovate! And that is exactly what a group of engineers at Yahoo did in 2006. Faced with an avalanche of data, they had to invent a new technology which could absorb all kinds of data at very fast rates, could be resistant to one or multiple servers going down (fault-tolerant), could scale endlessly by adding more hardware, could allow them to draw value from this data, and that could do this in a cost effective manner since they were just trying to figure out how to make money: this frantic cauldron is where Hadoop was born. Not only did these engineers create a brilliant solution, but they went farther and did something else so odd and revolutionary that could only happen in this little patch of land south of San Francisco: they shared the code of this innovation so that anyone and everyone across the world could download it and use it for free, they Open Sourced it!

I had the fortune of meeting with four of these founders recently when I was at the Santa Clara headquarters of Hortonworks, the leader in this field, and my current employer. They were Alan Gates, Arun Murhty, Owen O’Malley and Sanjay Radia, all part of that original Yahoo team that created Hadoop and later spun off to create Hortownworks. To be honest the experience was transformational. These were extremely smart engineers which were not chasing fame and fortune, but rather reveled in the wonder of elegantly solving new engineering problems, and possessed such humility, despite being creators of transformational technology having a global impact, that was both disarming and refreshing; such humility quickly disappears the farther you drift away from this valley. Here they use unique language to this technology way of life; they speak of “Contributors”, software developers all over the world who write code for these open projects; “Committers” the small subset of brightest contributors which are allowed to put code inside the repository, “Ecosystem” meaning all the complimentary technologies which together solve a particular problem; they even speak of the “Community” as this ethereal collective of software developers globally which share a common vision and value: that what is important is the most efficient way of solving a technology problem. New features are “Voted on” to be approved to become part of each open project, and they are fiercely independent, even though they all work for companies with a specific agenda. This can be as close to Engineering Nirvana as you could get.

So how does an Open Source Internet driven transformational technology have to do with Financial Regulations on the other side of the continent? Very simply, here you have a proven solution that is tailor made to store ALL the data generated in financial services, keep it forever, process it quickly and efficiently, draw actionable insights out of the data, and do this at a small fraction of the cost of any other technology solution previously available. This is the perfect solution that every regulated entity must adopt, and it is exactly what the regulators are mandating, it is called a Modern Data Architecture.

Financial Services has become a data business, and regulating the markets has become a data surveillance and data analytics application.

The new regulations are not reacting to current widespread abuse, but are instead focusing on preventing new abuses. They are not regulating a specific behavior, they are laying the technology foundation on top of which they can build numerous surveillance and enforcement systems now and in the future. This is truly forward looking architecture. Looks like the slow tortoise just passed by the sleeping Hare.

Epilog: The Modern Data Architecture

The explosion of new types of data in recent years – from inputs such as the web and connected devices, or just sheer volumes of records – has put tremendous pressure on traditional Data Storage solutions.  The data from these sources has a number of features that make it a challenge for a traditional data warehouse:

  • Exponential Growth. An estimated 2.8ZB of data in 2012 is expected to grow to 40ZB by 2020. 85% of this data growth is expected to come from new types; with machine-generated data being projected to increase 15x by 2020. (Source IDC)
  • Varied Nature. The incoming data can have little or no structure, or structure that changes too frequently for reliable schema creation at time of ingest.
  • Value at High Volumes. The incoming data can have little or no value as individual, or small groups of records. But high volumes and longer historical perspectives can be inspected for patterns and used for advanced analytic applications.

The solution to these challenges is what is called the Modern Data Architecture. The basic components of this solution are:

First, Consolidate All Data. Store massive amounts of all types of data forever, regardless of the type or structure of the data, or lack of, in a cost effective manner. Organizing data into a single data source allows a richer set of questions to be asked of the data. You cannot be data-driven if you do not have a view across all available data, and new data needs to drive evolution in business. Hadoop offers the best value proposition to store massive amount of data and is the base to create the “Data Lake” where all the data resides.

Modern Data Architecture
Modern Data Architecture

Second, Universal access to the Data. All applications and users of the data need to have access to the data easily, so new value can be extracted from the data, and new applications can be implemented quickly. Schema-less nature, Open standards, and Open access from the Analytics applications is the basis to insure easy access.

Third, Multi-use, Multi-workload Data Processing. By supporting multiple access methods (batch, real-time, streaming, in-memory, etc.) to a common data set, Hadoop enables analysts to transform and view data in multiple ways (across various schemas) to obtain closed-loop analytics by bringing time-to-insight closer to real time than ever before.

Fourth, Intelligent Analytics. Run analysis on top of the data to extract value both in real-time as well as historical.

Fifth, Governance. The enterprise tools to deploy, grow, administer and secure this infrastructure, including Administration, Authentication, Authorization, Audit and Data Protection.

Modern Data Applications are dynamic, with consistent and global views of assets, customers, activities, etc, pulled together with smart analytics that empowers new levels of computing value.

Leave a Reply

Your email address will not be published. Required fields are marked *