Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
November 17, 2017
prev slideNext slide

Building a global data lake for International Banking

Financial institutions need to leverage all the information they can gather to guide future investments, reduce risk and detect fraud. These objectives directly influence an institution’s bottom line and have become more challenging with the rising volumes and varieties of Big Data. To keep up, financial institutions are continuously adapting their data architectures to support these evolving business needs. Many institutions are adopting a Hadoop based data lake to leverage distributed process computing for faster data analysis.

With international finance institutions building enterprise data lakes on Hortonworks Data Platform (HDP), they need a data integration solution that will ensure secure data movements across numerous disparate and diverse source systems. Let’s look at the Scotiabank use case as an example of a successful implementation on HDP leveraging data integration from Hortonworks partner Diyotta.

Scotiabank – A Data-Driven Organization

Scotiabank is an international financial institution with presence in over 55 countries and assets over 900 Billion dollars. Their decision to build an enterprise data lake solution turned into a massive initiative with many complex parameters related to data extraction, data compression, secure data transfer over the network, data privacy for personally identifiable data, and the data load into the data lake. Scotiabank chose to work with Hortonworks and Diyotta to help achieve the strategic goals of their project, as they continue their focus as a data-driven organization.

Issues Facing Financial Services Institutions

International Finance institutions in particular have very demanding requirements that are driving the adoption of Big Data Lakes and Big Data strategy. With increasing globalization, there is an urgent need to expand and integrate international locations or branches. With more branches to manage, the size of data sets are getting larger and more unwieldy to manage. Not only are the data sets large, there are newer types of data — such as unstructured data that are difficult to integrate, to traditional relational databases. The advent of new technology and associated potential benefits is pushing financial institutions to rely on data analytics and data visualization tools to guide business decisions and detect fraud.

But there are some hurdles to overcome before a financial institution can become a fully data driven organization. As they move to the digital era, financial institutions are struggling with the flexibility, cost and scalability of their existing systems. For example, many institutions have fragmented enterprise systems including the Book of Record Transaction systems and Enterprise Risk systems. These systems have been acquired over many years and built into vertical silos with very little integration. The lack of integration between systems has made it difficult for financial institutions to adopt new technologies fast enough, often leaving them vulnerable to both fraud and a competitive disadvantage.

Secure Data-Sharing Environments

In order to overcome their challenges, Scotiabank set a goal to provide a secure data-sharing environment to enable quality self-guided business decisions for the entire organization. An ecosystem framework was created for their international division but it had to meet local country, regional, divisional and corporate needs. There was also regulatory and policy compliance that had to be taken into consideration with the migration to a Big Data Lake.

Scotiabank developed a business framework comprised of four key components:

  • Security: Different security mechanisms need to be applied depending on where the data is originating from
  • Quality: Data quality is managed as part of the standard data management process
  • Self-Serve: Raw data sourced from multiple platforms can be processed to create custom data sets for each country, region or business unit
  • Toolkit: Allows for data mining for business insights using analytics and a variety of other tools

New Challenges

As this business framework matured, Scotiabank experienced many challenges including:

  • Security for data in transit: Moving data securely from a country data center over to the centralized enterprise data lake
  • Ingestion: Key elements when loading data into a data lake that needed to be addressed such as the window available to move the data, the volume of data to move, the frequency of the changes at the source system
  • Network Latency: This latency impacts both the ingestion time and the response time
  • Process Request: Reducing the preparation time for small bursts of execution is key to a responsive environment
  • Response Time: This key challenge, impacted by both network latency and process request, is fundamental to user adoption

Framework Optimization

Scotiabank had to optimize their framework to achieve economies of scale and to propel their business to the next level of profitability.

To solve the data ingestion latency problem, Scotiabank used a variety of ingestion toolkits and implemented database ingestion as well as file ingestion patterns. Spark was used for real-time data streaming in conjunction with the data and file ingestion pattern mechanisms.

Scotiabank continued to support their data warehouse while migrating data to an enterprise data lake. The data lake enabled Scotiabank to centralize their data and serve all their regional and corporate divisions. There were regulatory and compliance issues with multiple data sources from different countries, and a lot of education was needed with local governance bodies in order to move data out of local countries of origin.

Scotiabank continues to implement their strategic plan for an enterprise data lake with some significant portions already completed. The next goal is to on-board a complete original data set and then to prioritize that data based on customer zoning.

Diyotta and Hortonworks

Diyotta is a leading modern data integration company that makes software to help organizations orchestrate data movements on Hortonworks Data Platform and Hadoop. Diyotta’s Modern Data Integration (MDI) Suite was designed based on five principles that proved to be strategic for Scotiabank’s goals. The five modern data integration principles are:

  • Take the processing to where the data lives
  • Fully leverage existing platforms based on what they were designed to do well
  • Move data point-to-point to eliminate single server bottlenecks
  • Manage all business rules and data logic centrally
  • Make changes using existing rules and logic

Diyotta’s MDI Suite on HDP enabled Scotiabank to begin implementing a secure global EDL and modernize their Big Data ecosystem. With the MDI Suite’s agent-based architecture, Scotiabank can orchestrate data movements into and out of their data lake, increase data throughput and maintain comprehensive data lineage. After only a few weeks into the architecture changes, Scotiabank gained 6x faster data extraction and 11x faster data movements across international borders.

Learn More

Leave a Reply

Your email address will not be published. Required fields are marked *