Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
February 05, 2018
prev slideNext slide

Combat Modern Cybersecurity Challenges with Big Data and Machine Learning

With great power comes great responsibility¹.

As organizations harness the power in data to grow their business through initiatives like IoT, predictive analytics, single view of customers and more, now is the time for them to further exercise their corporate responsibility. This means that organizations need to safeguard data to protect their business continuity, brand name, and most importantly, the people: employees and customers.

Cyberattacks have continuously and severely plagued business entities and global economy over the years, as intelligent consumers are certainly aware. Take one of the critical attacks last year for example, the Equifax Data Breach². The company suffered a 27% decline in earnings in the third quarter compared to the previous year, and reported $87.5 million of pretax expenses in the same period, with an estimated additional expense of $75 million in the subsequent quarter. The leakage of sensitive private information impacted 145.5 million consumers, nearly half of the U.S. population. The aftershock of a data breach can go on and on, from tangible monetary loss to intangible harm, such as loss of credibility and damage to brand equity, from which the company can take years and maybe decades to recover.


Unfortunately, building a sound enterprise-wide cybersecurity mechanism for protection is easier said than done. With the proliferation of connected devices, cloud, and IoT deployments, the expanding exploitation and attack opportunities cause threat levels to rise at an exponential rate, outpacing traditional security tools and defense capabilities. Particularly, in the following three areas:

  • Length of an unnoticed breach — on average, an advanced security breach can hide in an enterprise system for 8 months. Some, such as the infamous Yahoo breach, have been discovered to be in place for years. The shocking reality is that most organizations don’t even have the storage capacity and scalability to store 3 months worth of cybersecurity data. Intruders can lie dormant in a company’s system for months without being noticed, covering their tracks, and acquire credential information of customers and employees to hack into their personal networks, such as emails, bank accounts, healthcare records, etc.
  • Speed of occurrence — 82% of breaches happened in minutes. Most of the perimeter and encryption defenses that organizations employed and spent millions of dollars on can be penetrated within minutes. To put things into perspective, an average large-sized company generates over 100,000 alerts per day, and let’s hypothesize that, there is one intruder in that pool. Detecting that single attack is like finding a needle in a haystack, not to mention doing so in minutes, or even seconds. Before you know it, the intruder is sitting in the system for 8 months and much harder to dig out.
  • Shortage of security personnel — the market is seeing a staggering level of demand for cybersecurity-related roles. According to ISACA, a non-profit information security advocacy group, there will be an estimated global shortage of 2 million cybersecurity professionals by 2019. The personnel shortage indicates organizations’ vulnerability to detect, deter, and defend against evolving, surging cyber threats.


Because the hyper-connected digital world produces cybersecurity data at a volume and rate that companies can’t keep up using manual process and traditional security tools to safeguard data, they now turn to the most advanced solutions in The Information Age – Big Data and Machine Learning.


The hallmark of big data is the ability to ingest, process, aggregate, and manage vast amount of data coming from variety sources at different speed. This capability provides normalization and analytics of massive security and related data sets for easy detections of anomalies in investigations. Because machine learning models can be trained to recognize patterns and independently adapt to new data, the combination of the two, Big Data and Machine Learning, creates a powerful automated cybersecurity mechanism that shines a light on the dark age of widespread cybercrimes for enterprises.

Sitting at the prime intersection of Big Data and Machine Learning, Hortonworks Cybersecurity Platform (HCP™), powered by Apache Metron, employs a data-science-based approach to visualize diverse, streaming security data at scale to aid Security Operations Centers (SOC) in real-time detection and response to threats. This open source platform is built on top of the unmatched scalability and governance of data in Hortonworks Data Platform (HDP™) and the real-time ingest and processing capability in Hortonworks DataFlow (HDF™). Core features³ of HCP include:

  • Ingest and data enrichment in real-time of security data sources at millions of events per second.
  • Real-time behavior profiling at scale.
  • Petabyte-scale storage platform allows larger training sets and detailed forensic replay when a cyber threat is detected.
  • Rapid productionization of machine learning, allowing data scientists to work in real-time and monitor environments faster.
  • “Single view of risk” user interfaces make SOC analysts more productive, and dashboard and notebook interfaces make data scientists more effective.

With HCP, users are able to streamline their operational efforts and focus on high-value, urgent items based on alert prioritization. Additionally, the application of advanced analytics and Model as a Service provides a platform for cutting-edge machine learning models using technologies like Spark, GPUs and deep learning. These features bring efficiency and effectiveness to SOC operators, as well as better detection of unknown threats.

The Hortonworks product offering includes support for the Cybersecurity platform as well as our industry-leading professional services to install and harden platforms to build your security data lake. Our delivery teams integrate and implement common data sources such as Active Directory, NetFlow, DNS logs, Proxy logs, Firewall logs, application logs and others and implement alert and anomaly detection. We can provide solutions for use cases like personalized monitoring of user behavior, password attacks, geo-improbably activity, changes in server and client behavior³’ and many others.


To learn more about this topic, please visit:




¹Origin, “With Great Power Comes Great Responsibility”:
²Cyberattack Casts a Long Shadow on Equifax’s Earnings, New York Times, 2017:
³Hortonworks Introduces Real-Time Cybersecurity Threat Detection With Extensible Open Data Models, Press Release, 2017:
³’Hortonworks Cybersecurity Platform – Big Data Cybersecurity Solution, Simon Elliston Ball, 2017:


Priyanka Rai says:

cyber security analytics Identify threats and anomalies associated with users and other entities within your organization: User and Entity Behavior Analytics (UEBA). To know more >>

Leave a Reply

Your email address will not be published. Required fields are marked *