Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
February 09, 2016
prev slideNext slide

How Pioneering Banks Adopt Hadoop for Enterprise Data Management

Big Data and Apache™ Hadoop® are driving tectonic shifts in enterprise data management (EDM) within the financial services industry. Open Enterprise Hadoop and the vendor ecosystem growing up around it are consolidating and standardizing data architectures at the country’s largest banks—transforming expensive, inflexible, and proprietary data landscapes into economic, agile, open source data environments.

Regulatory Pressures Force Architectural Renovation

Banks are accustomed to investing in data solutions just to “keep the lights on.” As data volumes and variety increase, they pour money into legacy platforms, without a commensurate improvement in functionality. This stalemate locks up IT budgets that would otherwise go to innovation– whether for defensive applications (for risk, fraud and compliance) or expansive projects to build modern data applications for customers, partners and employees.

Technologies such as Oracle databases, MPP systems, and Enterprise Data Warehouses (EDWs) cannot efficiently capture the newer types of data generated by consumers have access to multiple touch points (cellphones, tablets, and PCs) and also channels that are outside the direct control of the banks (such as Facebook, Twitter, 3rd party payment platforms).

Simply put, bank IT budgets can no longer cover the same spending on specialized hardware and hosting and services. Regulatory pressures that mandate additional risk and compliance costs only compound these pressures. These regulatory forces include Basel Committee guidelines on risk data reporting and aggregation (RDA), The Dodd-Frank Act, the Volcker Rule, and regulatory capital adequacy legislation such as Comprehensive Capital Analysis and Review (CCAR). These regulatory pressures force an urgent retooling of existing data architectures.

These forces are transforming Risk and Compliance from a set of “check box” activities into a unique and compelling opportunity for competitive advantage. The banks that build agile data architectures can navigate regulatory changes more quickly than their competition while also gaining deeper insight into the fabric of their business.

Hortonworks Connected Data Platforms Open the Path for Innovation

Since the financial crisis of 2008, the open source community has matured and hardened technologies such as Apache Hadoop and Apache NiFi—further enhancing enterprise-grade services for operations, security and data governance. Hortonworks Data Platform (HDP) is powered by 100% open source Apache Hadoop and Hortonworks DataFlow is powered by 100% open source Apache Nifi. These connected data platforms should form the backbone of any enterprise-grade data management system.

The overall goal of our banking customers that adopt HDP is to create a cross company data-lake containing all data in one place. That data lake is then fed and enriched by more data delivered by HDF. Click on the below graphic to see how it works.

Hortonworks Stack Diagram

Here are the general steps in the process:

1) Data Ingestion: L1 loaders are created to take in Trade, Loan, Payment and Wire Transfer data. Historically, ingestion has been a complicated and expensive problem at most banking institutions. HDF simplifies this with an intuitive user interface and end-to-end security and traceability.

2) Data Governance: L2 loaders apply rules to the critical fields for Risk and Compliance. The goal here is to look for gaps in the data and any obvious quality problems involving range or table-driven data to facilitate reporting on data governance.

3) Entity Identification: A lightweight entity ID service consists of entity assignment and batch reconciliation. The goal here is to get each target bank to propagate the Entity ID back into their booking and payment systems. At that point, transaction data will flow into the data lake with this ID attached, facilitating a 360-degree view of the customer.

4) Development of L3 loaders: This will involve defining the transformation rules that are required in each risk, finance and compliance area to prep the data for processing.

5) Analytic Definition: Defining the analytics for each risk and compliance area.

6) Report Definition: Defining the reports for each Risk and Compliance area.


Our banking customers that adopt connected data platforms as part of an integrated approach to Big Data realize the following benefits:

  • Improved insight and a higher degree of transparency in business operations and capital allocation
  • More rigorous governance procedures and policies that can help track risks below the summary level down to the individual transaction
  • Streamlined processes across the enterprise banking domains, including investment banking, retail and consumer, and private banking

At Hortonworks, we look forward to helping new customers obtain similar results.

Learn More About Hortonworks Leadership in Financial Services


Leave a Reply

Your email address will not be published. Required fields are marked *