Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
May 12, 2017
prev slideNext slide

Freddie Mac & KPMG: Advanced Analytics & Insights of Semi-Structured Data

Freddie Mac makes home possible for millions of families and individuals by providing mortgage capital to lenders. Since our creation in 1970, we’ve made housing more accessible and affordable for homebuyers and renters in communities nationwide. We are building a better housing finance system for homebuyers, renters, lenders and taxpayers.

KPMG, headquartered in Netherlands, is a global network of professional firms providing audit, tax, and advisory services.

Technology leaders at both Freddie Mac and KPMG have developed a framework to accelerate the “data wrangling” process so our businesses can draw insights and provide feedback to customers and internal stakeholders as soon as a product is launched.

Next month, at the San Jose DataWorks Summit (June 13-15), both myself and a colleague from KPMG will present our Big Data achievement and results.

Please join me, Lakshmi Purushothaman, Senior Risk Analytics Director at Freddie Mac, and Kevin Martelli, Director of Technology Enablement at KPMG, on Thursday, June 15th, as we present:

A Freddie Mac and KPMG Case Study: PySpark for Advanced Analytics and Insights over Semi-Structured Data

Freddie Mac and KPMG have developed a common, generic, self-learning data engineering framework to handle and manage all data integration challenges from multiple sources with one solution. The reusable, extensible program executes against multi-dimensional, semi-structured XML data sets, leverages Jupyter Notebook as well as core Apache components of Hortonworks Data Platform (such as Spark, Hive, Oozie and Zeppelin). By leveraging PySpark and other tools, the modular architecture provides faster, easier data processing with lower development cost.

The resulting analytics allow Freddie to extract knowledge and insights to roll out new product capabilities, risk monitoring, Quality Control sampling and Fraud Analytics. The application runs the processes in a highly distributed and memory-intensive framework to reduce processing time. This high-level overview of the Freddie Mac Big Data Solution will share best practices to generically process semi-structured data while retaining the complex structures needed by data scientists and teams focused on advanced analytics.

Be sure to register for the DataWorks Summit to catch this presentation and many others!

Leave a Reply

Your email address will not be published. Required fields are marked *