Argyle Data is a Hortonworks Technology Partner and recently certified on the Hortonworks Data Platform (HDP), and was awarded the OPS Ready badge for their integration with Apache Ambari. Here, Dr. Ian Howells talks about how Argyle Data is helping customers detect fraud faster with their native Hadoop application.
We believe that the world is moving to a new generation of native Apache Hadoop applications. When you build your application from the ground up on Hadoop, it is critical to make it simple for any organization to provision, manage and monitor at scale. We are excited to be integrated to the Hortonworks Data Platform (HDP) and to be announcing that Argyle Data has achieved the Operations Ready Certification. Our view is that a “native Hadoop application” means that you build your application using native Hadoop services and optimize for those services. Simply put without Hadoop there is no application – “No Hadoop cluster, no application”. When you take this approach, a tight integration to Apache Ambari is a key part of that vision.
In the last several years, fraud has moved from a back office subject to front-page news. The Association of Certified Fraud Examiners reported that the typical organization loses 5% of revenue each year to fraud, totaling to a global loss of $3.7 trillion. Communications Service Providers, for example, lose $46 billion per year to fraud, according to The Communications Fraud Control Association. Because the rewards are so large, criminals are often out-innovating enterprises across all industries.
Criminals are innovating rapidly while carriers defend themselves with the same approaches as three to five years ago. Fraudsters use a combination of attack vectors and patterns in real-time, while enterprises use silos of data across multiple systems, batch approaches using ETL, and rules based on old types of fraud. Siloed data and rules allow criminals to “fly between the systems and under the rules radar” evading detection. This old technology only delivers hindsight on what happened, without offering insight on future attacks. Just like you can’t drive a car by only looking in the rear view mirror, you can’t rely on a retroactive model to fight fraud.
2014 will be remembered as the year that the fraud and security cyber dam broke. Pre-Big Data systems approaches are losing the battle. They either fail, not discovering the fraud, overwhelm users with false positives, discover fraud after the criminal has gone or use dated rules that discover last year’s fraud not today’s or tomorrow’s. Fraudsters are evolving their attacks as technology evolves, why shouldn’t our fraud systems do the same and exploit the power of Hadoop and Machine Learning at scale?
Argyle Data is a real-time fraud analytics application built natively on Hadoop via HDP using the latest technology in deep packet inspection, machine learning and anomaly detection to identify fraud at scale. Argyle’s system is currently live at some of the world’s largest mobile operators where the machine learning algorithm has already been proven to: detect fraud that had never been detected before by previous systems, detect fraud in minutes rather than days, discover both new and old fraud attack techniques, and dramatically reduce false positives, saving these companies millions of dollars.
HDP is a core component of Argyle Data’s application enabling the system to operate with access to a data lake of both operational packet data and business data instead of isolated silos of data. The process of ingesting and analyzing data then becomes automated and works in real-time. Additionally, HDP provides the underlying data storage to enable a shift from rules to machine learning and proprietary hardware to low-cost commodity Hadoop hardware.
Going into more detail, there are several core components in this native Hadoop architecture. Flume is used for file ingestion and Deep Packet Inspection (DPI) converts packets into key values, which are stored at petabyte scale in Apache Accumulo. Argyle Data does real-time feature enrichment and real-time machine learning based fraud detection as data is ingested. Facebook’s distributed SQL query engine, Presto, is integrated to Accumulo with both primary and secondary indexing offering interactive querying with ANSI SQL support. All of this is integrated to Ambari to simplify the provisioning, management and monitoring of the Hadoop cluster. As can be seen below Argyle Data and Presto are integrated as native components, next to Flume, Zookeeper, HDFS, MapReduce, YARN and other familiar services.
The combined solution enables mobile communications operators to lead the innovation battle against criminals using the power of HDP and Hadoop. This saves millions of dollars from the bottom line for mobile operators, protects their brand from reputational damage and protects subscribers from the damaging effects of fraud. Mobile communication is permeating every part of our lives and every industry. If you were lucky enough to attend Mobile World Congress this year, you could see before your eyes how quickly the world is evolving into a mobile-first era from handsets to connected cars, connected homes, mobile payments, fitness and the Internet of Things (IoT). Hortonworks and Argyle Data will be able to protect not only handsets from fraud but also all of these connected (and vulnerable) devices.