Do you like looking for the needle in the field of haystacks? Do I have a job for you; security operations center (SOC) analyst. You will spend your days looking at hundreds of thousands of alerts – created by rules engines – where only a very few a week actually matter. Your job is to manually review all of them, filtering out the noise to find the few that matter. Yes, it will take hours to review each one and there won’t be enough time in the day to review them all; but, what can you do?
Maybe we could learn something from the past. During world war, two allied bombers were being shot down in huge numbers and just putting more planes in the air wasn’t solving the problem; a new approach was needed. The already overloaded bombers needed better armor – armor that weighed too much. Seeking a new answer, the allied effort turned to analytics seeking the best balance of weight to armor coverage. Fortunately, they asked Abraham Wald. Abraham analyzed the bullet hole patterns on the surviving aircraft to determine where to place the armor.
You would think the answer would be in applying armor in the areas with the most bullet holes would be the answer, right? After all, that is how our poor SOC analyst’s security incident and event manager (SIEM) works. Responding to the events seen in an efficient manner makes sense, right?
Wrong. Abraham’s lightbulb moment was correctly understanding that his data came from the surviving aircraft and not those that had been shot down. The data being collected wasn’t the correct data to take action on. The answer was to put armor on the aircraft where the bullet holes weren’t. As, without additional data currently laying in fields behind enemy lines, those areas were probably where the shot down planes had been hit.
Lightbulb Moment: Look where the bullet holes aren’t
As a SOC analyst we must realize that we are like Abraham; our SIEM’s alerts are like the bullet holes on the surviving aircraft; a massive amount of noise and ultimately the wrong data to be reviewing. Each day, ask yourself the following question: Did my company just go out of business from a cyberattack? No? Then I’m probably looking at the wrong data.
What we need is a way, like Abraham, to look where the bullet holes aren’t. That requires moving away from rules engines and using analytics to find what matters.
At its most basic, a rule is a little bit of boolean logic expressed in the form of an if, then, else expression. If I see this bad thing, generate that alert, otherwise keep looking. This fundamental principle has been the basis of most point security solutions. Whether they call them rules or signatures; Firewalls, intrusion detections systems, intrusion prevention systems, data loss prevention systems, antivirus; all in some fashion leverage a rules engine to generate their actions or alerts. If I detect this bad thing then do something.
Except, how do you know it is really a bad thing? This is where the arrogance implicit in this approach causes our poor SOC analyst problems. The security solutions all ship with nicely packaged rules ready to detect all that bad stuff. Millions are spent claiming how great and how many rules they ship with, how many more rules product X has versus product Y. The arrogance is a failure to understand the difference between the known and the unknown. A rule defines what is known, by its very nature it can’t define what is unknown. Like the planes reviewed by Abraham, it is the unknown that will harm your company.
Hopefully it is clear that the most precious resource in the incident response process is the SOC analyst’s time. Reviewing data regarding what is known is a waste; if it’s something bad then automate a response, leverage the known normal baseline to filter out the noise. The role of the SOC analyst is to take the unknown and make it known and move on. A mature process then takes that new known and automates the appropriate response.
To efficiently grapple with the unknown we need to separate the known from the unknown. Let’s introduce the concept of lists to filter the known good (white) and the known bad (black) from the unknown (grey). To be clear, these lists aren’t made up of rules in an engine. They are behavioral patterns discovered as part of an analytic process. Patterns are discovered via machine learning (grey) and the SOC process categorizes the pattern as normal (white) or abnormal (black).
Grey: This is the place where the use of the SOC analyst’s time is the most valuable. Patterns of behavior are discovered. They may be good or bad and review is required. Regardless, this shouldn’t be viewed as the old rules based paradigm of false positive tuning. This critical step consists of taking what was an unknown behavior and classifying into the white or black lists. Both has value if leveraged by the enterprise as a whole.
White: This list consists of known normal behavior. The incident response process leverages this to filter out the noise and focus the SOC analyst’s attention towards what matters. In an integrated platform, this list is a gold mine for predictive business analytics. From predicting system failure, capacity planning, to customer behavior; predicting the normal is a key value in its own right. Spending time figuring how the enterprise at large can leverage this list can help justify the CBA/ROI of the solution including additional SOC analyst staffing. Done right this shifts the SOC from a cost center to a revenue opportunity center.
Black: Ugh, the SOC analyst reviewed the pattern and it’s not good. This needs forwarded into the case management system for incident response. Wouldn’t it be great if all the data collected to date was captured – with full chain of custody – and sent into your case management system to help make your remediation and forensic activities more efficient? Wouldn’t it be great if this was automated so the SOC analyst can do something more then click the big red respond button? That’s the goal for this list; automate both the detection and response escalation so the SOC analyst’s time is spent on reviewing new patterns.
Rules are great, rules engines are mature and valuable to the incident response process. So what’s the best way to leverage them? There are two great use cases; detection rules and response rules.
So, you’ve made a huge investment in point security solutions and you’re worried I’m saying that you should scrap them and start over? Nope, there are great solutions available in every space and I’m a fan of a great many of them. All I’m saying is they shouldn’t be in the middle of your SOC analyst triage process – they find the bullet holes so your SOC analyst can look elsewhere. Think of all those tools as being part of your sensor network generating or enriching the data feeding into the machine learning process. Seeking solutions that are open and able to integrate into the larger process may be a consideration when choosing between product X and product Y; but existing tools definitely have their place.
So what do I mean by a sensor network? Just like Abraham above, we need data to leverage analytics to find what matters – to look where the bullet holes aren’t. Unlike Abraham, we don’t have to accept the lack of data, we can seek out new forms of data to look where the rules engines can’t. Logs, netflow, packet capture, etc., can all be leveraged to provide a running feed of the activity on your networks and systems.
Lastly, full analytic modeling is expensive and if our goal is maximizing value through efficient use of resources then sometimes a simple detection rule is the right answer. Leveraging the pattern categorized in the black list, a rule engine leveraging a simple regular expression may be the more efficient way of detecting the pattern.
The key is these detection rules either provide data for the analytic platform to do its thing or feed the automated response process – they don’t belong in front of your SOC analyst’s eyeballs.
Automated response is where a rules engine really shines. The sensor network has provided a high fidelity feed of all activity on your networks and systems, detection rules have detected known anomalous activity, patterns have been learned and categorized by our SOC analysts, and now we need to do something about it. This is the major difference between cybersecurity analytics and other analytic systems. This isn’t a simple job to generate a pretty report full of graphs and trend lines. We need an automated way to take action; the faster we can respond, the less damage the bad actions can cause. Either we can leverage the existing point security solution to respond and remediate – if we focus on products that are good neighbors that are open to integration; or we leverage the rules engine in our manual case management and workflow process.
In my last article, >echo “Hello, world.”, I shared my lightbulb moment on why I feel an analytic approach cybersecurity matters. Beginning with this article, I’ve started to show you the day in a life of a SOC analyst; explaining why looking at alerts generated by rules engines in point security solutions is the wrong data for a SOC analyst to review. Next, I’ll continue the journey through the day in the life of our intrepid SOC analyst where I’ll highlight why context matters; showing why leveraging Hortonworks DataFlow can make the SOC analyst’s life easier and how you can build the value proposition for your business case.