Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
February 17, 2016
prev slideNext slide

Cybersecurity: Why context matters and how do we find it?

Welcome back to my blogging adventure.  If you’ve been reading along, you’re aware of the lightbulb moments from my article, “echo: hello world”, that allowed me to discover the benefits of an analytic approach to cybersecurity.  Next I gave a little slice in the life of our intrepid SOC analyst in, “Cybersecurity: the end of rules are nigh”, where I gave a little detail behind my belief that we need to move away from a rules detection approach to cybersecurity monitoring.  Today, we will spend some more time with our SOC analyst living the life of event triage. My hope is we come away with a greater understanding of why context matters as I show a high level process for efficient incident response triage.

The context conundrum

To understand why context is so critically important we need to forget about technology for a moment and focus on people and process.  A hard lesson I’ve learned in my career is that when we focus on the technology we end up creating solutions that make the person work for the machine instead of the machine enabling the person.  So let’s take a moment and get in the shoes of our intrepid SOC folks and walk through a day in their lives.


Triage Analyst

Typically, the first line is the SOC analyst focused on responding to alerts and determine if it’s a false positive or something that requires escalation.  Typically, this is a junior shift level person in the security equivalent of the help desk call specialist role.  They have job guides, run books, or knowledge trees that they follow as they gain experience.  The process they follow is probably documented as follows: Easy as 1-2-3: 

1. Look in SEIM and select top alert

2. Review

3. Decide to escalate or filter



Screen Shot 2016-02-16 at 7.50.45 AM

Pretty simple process right?  If only the real world worked that way. What actually probably happens are steps 1-14, give or take another 1 or 14 more.

  1. Look at the SOC dashboards to get an overall feel of what’s going on
  2. Look at SIEM alert containing two IP addresses and and obscure alert name
  3. Go into several other consoles looking up what system owns each IP address
  4. Web search if IP address is external to see who owns it and if it has a bad reputation
  5. Take the looked up system names and enter yet another console to look up the asset inventory information such as what should be running on the machine and who owns it
  6. Send emails to the owners for details since the asset inventory information is probably out of date or incomplete
  7. Look in yet another console for details regarding the alert such as what it does and what vulnerabilities it targets
  8. Look in yet another console to see if the asset has been patched or fixed as of the last scan, since the scan is probably a few weeks out of date
  9. Send email for a one-off vulnerability scan to verify
  10. Look in yet another console to see if the systems have been backed up and when the last time the backup has actually been tested
  11. Start feeling nervous since most DR tests are yearly
  12. Look in yet another console to see if the assets are one of the systems logging to the consolidated repository or if more emails need to be sent to get the logs
  13. Realize that not every application log on the system is actually logging to the repository
  14. Send emails to the applications teams to get those logs…


  1. Two hours later after all the data can be reviewed – false alarm
  2. Time to send emails to escalate all the broken things like log forwarding being broken and no one had noticed, backup jobs showing as incomplete or failed,  patches marked as installed showing up as no longer installed in the one off vulnerability scan, etc
  3. Done?
  4. Time for lunch then on to the next alert review.


  1. End of shift turnover to next group of analysts regarding emails still waiting for response and the other few hundred thousand events you didn’t have time to get to.

















cybersecurity2Security Engineer

The short term goal of any analyst is to get away from the triage cycle and move up to the role of the security engineer.  Typically, the security engineer’s primary responsibility is the care and feeding of one or more point security solutions.  They do capacity planning, system maintenance and upgrades, and be available to assist if their technology is part of an incident response escalation.  The promise is a standard work day with the occasional off hours call if incident response is required.  The reality is change management requires all maintenance be done outside of business hours and incident response means helping the triage specialists at all hours several days every week.  Since the point security solution is probably a rules or signature engine, the security engineer spends many hours dialing down the rules generating most of the false positives.

cybersecurity3Forensic Investigator

Do you get excited watching paint dry? Great! You have the mental fortitude to be a forensic investigator.  Your job is to get involved days after the fact to collect evidence and figure out what actually happened.  Chain of custody is a big deal either to enable the business to engage law enforcement, you need to help prepare the lawyer’s response, or regulations require a forensic response.  Making your job difficult is all the activity of the IT and SOC folks accessing the systems during the triage response and cleanup that you now have to painstakingly separate from the malicious activity.  Since the incident response process probably didn’t ensure chain of custody procedures were followed, you can’t rely on their work and have to recreate it from where you could establish custody procedures.  Yes, it will take weeks. Yes, you have a several month backlog, but hey, job security.


IT and business folks

Yes you have a job to do and these emails and tickets from the SOC folks are all marked most urgent.  Don’t they realize that those systems are having performance issues and/or an outage and getting them back up and working is the priority?  Take the server offline and restore from backup? Are you kidding, that would break the SLA; can’t they follow change control and schedule that during non-business hours?  Really, we need to spend time troubleshooting not collecting logs, running some scan, or updating some asset inventory no one looks at.

 Yes, context matters

Hopefully, everyone got through that heavy dose of sarcasm and realize that not once did I point the finger at technology products and say our solution’s better.  If you think I was being hard on the people involved, ask your people and see if they didn’t feel or think that way at least once in their careers. They may laugh, or they may cry, but we’ve all been there in the trenches trying to understand what’s going on when we know minutes matter.

Yes, we have a real problem with incident response and new technology will be required to help solve the problem; however, the fundamental problems are in the people and process interacting with complex non-integrated technologies where the people and process is made to work around the technology.  Until we as a security community fix that, any new technology such as security analytics are just going to create more alerts that are going to fall on the floor without review.

So, how do we make the technology work for us – to enable us to do our jobs efficiently and effectively?  Let’s spend a minute talking about the high level goals and objectives this ideal business process solution should solve.

End to end chain of custody of data

The investigators spend weeks rebuilding a chain of custody as they determine the root cause. Wouldn’t it be great if they didn’t have to? This is weeks of effort spent on highly skilled individuals or third party consultants charging hundreds an hour that could be saved. This needs to be inexpensive enough to hold all the data for several years.

Single repository of data for full understanding of the problem

I gave a hint above that this data has value beyond security.  IT and business process could be made more efficient if the full context was made available to everybody.  This is core customer facing experience improvement driving impact to the bottom line.  This requires a highly scalable data repository that enables open integration with every vendor technology – this means open standards without proprietary extensions.

A visualization and workflow user interface

No, I’m not talking a SIEM.  I’m talking about a user interface that allows you to automate your response workflow and visualization needs.  Why should you bend your process and people around someone else’s dashboard solution?  I’m talking about an open and extensible user interface that acts as the single console for the SOC triage analyst.

Just in time, automated evidence collection and response

As I keep saying, the value isn’t in the pretty dashboards; it’s in enabling timely action.  Open integration with technology products to trigger automated response.  Why wait days to pull a forensic image of the machine after it’s been trampled by layers of IT folks when leveraging network forensic tools can collect the image minutes after the malicious activity occurs.  Why wait to have the SOC triage analyst click the big red button to respond when you can automate network and system blocking or even DevOps container reloads from known good builds.  Yes, it is a hard political battle to get automated response approved, but with an analytic platform you can simulate the response of the last three years of data and show it wouldn’t have caused an outage.

A look ahead

In my next article, we will start to move away from why we should build an analytic first cybersecurity solution and the value it can provide, towards actually describing what the solution should look like and how Hortonworks can be part of the solution.  In a future article, I plan on spending a day in the life of the CISO as they struggle developing their security strategy and justifying their budget.  Who knows, maybe analytics can help.

Leave a Reply

Your email address will not be published. Required fields are marked *