Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
July 27, 2017
prev slideNext slide

Solving Cyber at Scale: Cybersecurity is a Long Tail Big Data Problem

Modern cybersecurity solutions now need to scale to a unprecedented level due to the growing volume and variety of data. Scale issues have gone from being about sheer volume of traffic such as those from of IOT led DDoS attacks or even the diversity of traffic you get from a high variety of device types, but also from the need for absolute speed to nip things in the bud and catch things before they become a problem.

With this, it creates a massive long tail problem – a huge amount of data that we’re all drowning in – every file anyone’s ever accessed, every packet that has gone across your switches, or even just a record of the metadata of every packet of every switch. Then within that – looking for very weak signals coming from attackers because they are smart enough to stay under your radar and remain undetected.

Cybersecurity is a Long Tail problem

Dissonant, disparate sources allow attackers to hide

From all this data, It’s not easy to extract weak signals while avoiding all the false positives which appear on many of the  dashboards available today. To combat cybersecurity threats, enterprises use SIEMS, augmented by simple search and log management systems like ELK or Splunk, a network packet store, perhaps Wireshark for packet analysis, and many additional tools, forensics platforms, threat intel platforms, plus some sort of endpoint agent that’s installed on all employee laptops – the list goes on an on. There’s a reason why the walls of SOC centers have banks and banks of monitors because they have ~30 different dashboards they need to monitor and manually correlate – and often each dashboard represents a siloed system that doesn’t interoperate or integrate well with the others.

And then there is the complexity of attacks themselves. Hackers know how to exploit this weakness of siloed approaches, and how to weave their attacks in between the silos, or just under the alert detection algorithm to remain undetected by normal methods based on the static rules in place today. This complexity is compounded by innovation which generates even more siloed systems and dashboards.

Cybersecurity Silos

Need for a real-time, single view

A single view is needed to bring this together combined with some machine learning for things like triage automation, detection of unknown threats and correlation automation in order to process the deluge of data that needs to be processed and analyzed.

Combined with this, is the need for real time enrichment of all this data. Correlating seven different consoles create context around a threat is currently a manually intensive and time consuming exercise that is not scalable in response to modern cybersecurity attack vectors.  What the world is like two hours later when you get around to investigating a problem is not the same, and thus, real-time enrichment on data and enriching data at the source as it comes is crucial to an accurate representation on the context the alert occurred in. (Read more on real-time enrichment)

DIY or Leverage an open source framework?

To meet cybersecurity needs today, many choose the do it yourself approach. They write scripts to individually make sure all the data feeds coming into your system are normalized and build the logic on top of that to analyze all the data and then write their own streaming programs to join, enrich, analyse and index the data. Which would likely take the next two years to plumb it all together.

Or you could use the framework Apache Metron has built. Apache Metron pulls together a framework and a reference implementation that gives you a leg up. This means you can go out and hire a team of data scientists focused on your business who will solve problems for you without having to build everything from scratch. Apache Metron provides a head start  – a community that works together to collectively combat the bad guys.

To learn more, view these slides or watch the video “Solving Cyber at Scale” from DataWorks Summit.

Leave a Reply

Your email address will not be published. Required fields are marked *