Welcome back to my blogging adventure. When we left off in my Cybersecurity Architecture series, Cybersecurity Architecture: All about Sensors, we started to touch on some of the requirements our sensors require from the modern data plane. Today, we are going to dive deep into those requirements and walk through how we can leverage Hortonworks DataFlow to help address our modern data plane needs.
Before we dive in let’s review the high level conceptual architecture and see how this modern data plane interacts with the rest of the design. Looking at the below diagram, today’s focus is on the red arrow labelled Data Plane.
The great thing about a conceptual architecture design is that we can clearly see the separation of concern and the interfaces between the conceptual components. Looking at the red arrow, we see it has five interface points:
Great! Lets dive in and see how these components interact and see if we can continue to detail out our requirements for each component.
As we discussed in my last article, All about sensors, our sensor network is deployed in a distributed manner as close as possible to the applications, data, and systems as possible. From the perspective of the data plane, we want the sensors to connect safely and securely so as to maintain the chain of custody for this activity data. Unlike designs on the drawing board, in real life things change. We need the ability for the data plane to reach back and send messages such as reconfiguration requests back to the sensor to adjust to these changes. Connectivity between these sensors and the data plane may be intermittent or limited in bandwidth so a queuing and data priority forwarding strategy embedded into the sensor is required.
The data lake is already a well documented concept with mature architectures available. Let’s focus on the interface between the data plane and the data lake as this creates a significant departure from the existing data lake architectures.
Existing platform based Hadoop architectures make several implicit assumptions on how users interact with the platform such as developmental research versus production applications. While this was perfectly good in a research mode, as we move to a modern data application architecture we need to bring back modern application concepts to the Hadoop ecosystem. For example, existing Hadoop architectures tightly couple the user interface with the source of data. This is done for good reasons that apply in a data discovery research context, but cause significant issues in developing and maintaining a production application. We see this in some of the popular user interfaces such as Kibana, Banana, Grafana, etc. Each user interface is directly tied to a specific type of data lake and imposes schema choices on that data.
The reason modern application architectures evolved to use the basic View -> Controller -> Model (MVC) is to address this issue of tight coupling and maintain separation of concern. In a scalable application, the user interface has no concept of a data source, it requests and responds with data to a service and doesn’t know or care where that data came from. The data could be delivered from storage, compiled from multiple data sources, pulled from a live stream, or even computed on demand – the user interface doesn’t and shouldn’t care. The great thing of leveraging Hortonworks DataFlow to enable a modern data plane, such as is we can take a hybrid approach that provides the stability and scalability of the MVC approach and the immediacy of data access design of the Hadoop architecture.
The interface between the data plane and the analytics engine is complex with many different bidirectional data flows. These data flows change frequently as the system must adapt to the enterprise’s environment, new analytic use cases, new workflow, and response rules. This pushed for a loosely coupled interface that can be changed quickly without code development. While the individual data flow interfaces between the analytic engine and the data plane will be measured in the hundreds in a mature cybersecurity analytic platform; the flows can be categorized into three main types:
The interface between the data plane and analytic engine shares the same requirements expressed above in the sensor and data lake sections and adds the below.
The response rules engine is the middleware between the analytics engine, automated response, and workflow component. The data plane maintains the principles of loose coupling and segmentation of concen between these components. While the analytic engine’s scoring models implement the predictive analytic result, it is the combination of workflow and rules engine that determine the prescriptive response. Lets walk through the lifecycle of an automated response use case to see how these four components interact.
Like our sensor network, automated response represents all the different security controls and application integration points available. They are deployed in a distributed manner as close as possible to the applications, data, and systems as possible. From the perspective of the data plane, we want the automated response to connect safely and securely so as to maintain the chain of custody for this activity data. Bidirectional communication is necessary to validate the response request was received and new feed of follow-on activity as the new automated response triggers.
Okay, after that long winded walk through the middle of our conceptual architecture for cybersecurity as we look at what makes up a modern data plane; it should be clear that the modern data plane is the core critical element that glues our architecture together. It provides the separation of concerns and loosely coupled architectural principles that help with maintainability and scaling of the platform, it insulates the platform from the constant changing data and mess that is the outside world, and acts as a security barrier between the platform and the user interfaces that consume that data. Choosing the right technologies to enable the modern data plane is critical, and Hortonworks DataFlow is well positioned to meet these requirements. For the curious and diligent readers who caught that there are two additional interfaces, workflow & dashboards; we will be covering them in a future article as part of user interfaces.
I’m going to take a break from my cybersecurity architecture series and focus on the life of the CSO. I am kicking off this new blog series with my webinar “A look through the CISO’s eyes”. After that, my new blog series will go in depth on the basics of what does a CISO do and move on to how analytics can make the CISO’s life easier in building and maintaining a security budget and program.
Michael Schiebel is Hortonworks’ Cybersecurity Strategist where he leverages his over 15 years cybersecurity experience working in financial services and healthcare companies to help customers build cybersecurity analytic solutions. He has lead incident response and computer forensic teams, designed and built security solutions, and created security roadmaps and strategies; learning how to position security projects based on delivering bottom line value to the enterprise.
Hortonworks Cybersecurity Strategist