Financial institutions need to leverage all the information they can gather to guide future investments, reduce risk and detect fraud. These objectives directly influence an institution’s bottom line and have become more challenging with the rising volumes and varieties of Big Data. To keep up, financial institutions are continuously adapting their data architectures to support these evolving business needs. Many institutions are adopting a Hadoop based data lake to leverage distributed process computing for faster data analysis.
With international finance institutions building enterprise data lakes on Hortonworks Data Platform (HDP), they need a data integration solution that will ensure secure data movements across numerous disparate and diverse source systems. Let’s look at the Scotiabank use case as an example of a successful implementation on HDP leveraging data integration from Hortonworks partner Diyotta.
Scotiabank is an international financial institution with presence in over 55 countries and assets over 900 Billion dollars. Their decision to build an enterprise data lake solution turned into a massive initiative with many complex parameters related to data extraction, data compression, secure data transfer over the network, data privacy for personally identifiable data, and the data load into the data lake. Scotiabank chose to work with Hortonworks and Diyotta to help achieve the strategic goals of their project, as they continue their focus as a data-driven organization.
International Finance institutions in particular have very demanding requirements that are driving the adoption of Big Data Lakes and Big Data strategy. With increasing globalization, there is an urgent need to expand and integrate international locations or branches. With more branches to manage, the size of data sets are getting larger and more unwieldy to manage. Not only are the data sets large, there are newer types of data — such as unstructured data that are difficult to integrate, to traditional relational databases. The advent of new technology and associated potential benefits is pushing financial institutions to rely on data analytics and data visualization tools to guide business decisions and detect fraud.
But there are some hurdles to overcome before a financial institution can become a fully data driven organization. As they move to the digital era, financial institutions are struggling with the flexibility, cost and scalability of their existing systems. For example, many institutions have fragmented enterprise systems including the Book of Record Transaction systems and Enterprise Risk systems. These systems have been acquired over many years and built into vertical silos with very little integration. The lack of integration between systems has made it difficult for financial institutions to adopt new technologies fast enough, often leaving them vulnerable to both fraud and a competitive disadvantage.
In order to overcome their challenges, Scotiabank set a goal to provide a secure data-sharing environment to enable quality self-guided business decisions for the entire organization. An ecosystem framework was created for their international division but it had to meet local country, regional, divisional and corporate needs. There was also regulatory and policy compliance that had to be taken into consideration with the migration to a Big Data Lake.
Scotiabank developed a business framework comprised of four key components:
As this business framework matured, Scotiabank experienced many challenges including:
Scotiabank had to optimize their framework to achieve economies of scale and to propel their business to the next level of profitability.
To solve the data ingestion latency problem, Scotiabank used a variety of ingestion toolkits and implemented database ingestion as well as file ingestion patterns. Spark was used for real-time data streaming in conjunction with the data and file ingestion pattern mechanisms.
Scotiabank continued to support their data warehouse while migrating data to an enterprise data lake. The data lake enabled Scotiabank to centralize their data and serve all their regional and corporate divisions. There were regulatory and compliance issues with multiple data sources from different countries, and a lot of education was needed with local governance bodies in order to move data out of local countries of origin.
Scotiabank continues to implement their strategic plan for an enterprise data lake with some significant portions already completed. The next goal is to on-board a complete original data set and then to prioritize that data based on customer zoning.
Diyotta is a leading modern data integration company that makes software to help organizations orchestrate data movements on Hortonworks Data Platform and Hadoop. Diyotta’s Modern Data Integration (MDI) Suite was designed based on five principles that proved to be strategic for Scotiabank’s goals. The five modern data integration principles are:
Diyotta’s MDI Suite on HDP enabled Scotiabank to begin implementing a secure global EDL and modernize their Big Data ecosystem. With the MDI Suite’s agent-based architecture, Scotiabank can orchestrate data movements into and out of their data lake, increase data throughput and maintain comprehensive data lineage. After only a few weeks into the architecture changes, Scotiabank gained 6x faster data extraction and 11x faster data movements across international borders.