Apache Metron, at its core, is a streaming analytics solution. Yes, it is a streaming analytics solution aimed at detecting and prioritizing cyber threats at your network’s doorstep. Yes, it’s built atop Hadoop. Putting the end-game aside, it’s easy to see that the challenges that such a system faces are precisely the challenges faced in any streaming analytics solution.
Put short, the job of any analytics solution, streaming or otherwise, is to provide insights. Better insights generally follow from more context. By example, consider a logon event, how might we decide whether this event is by an attacker or by a trusted user? Well, if we knew more about the behavior of users across the system or, perhaps, if we knew details about the source IP, such as country of origin, we might build up an evidentiary store we might be able to provide that evidentiary context to our downstream analysts. Each of these contextual enrichments can be considered to build up a risk profile and decide, ultimately, whether this event is legitimate or malicious in an automated manner. Indeed, the following are almost axiomatic in systems such as these
Furthermore, because we are also operating in a streaming environment besides, we need to achieve the enrichments above just-in-time and without downtime. Indeed, the challenge is to provide the ability to provide a host of enrichments that bring forth relevant context to data as it streams by without impacting the throughput of the system. Furthermore, we must acknowledge that we will not think of every type of enrichment. As someone who has spoken to many customers and potential customers, my trust in my ability to completely predict user requests is low, at best.
It was clear that we needed a solution that had a few characteristics:
The bitter truth for practitioners of our field is that dirty and malformed data is the rule rather than the exception. Any system that has a hope of being acceptable must be capable of doing some scrubbing and transformation on-the-fly. Consider the use-case that we want to determine if a top-level-domain of a hostname is in a blacklist. We will adjust the risk profile accordingly. The problem, of course, is that top level domains need to be extracted and maybe the hostnames come in potentially untrimmed or corrupted in some normal manner. The options are our disposal is to push all of the data preprocessing and scrubbing into the whitelist functionality or to allow for the user to scrub the dirty input data on the way in. I strongly prefer the latter. Having small, composable units of work is a mainstay of a productive and workable system. Forcing enrichment authors to predict the complete range of intermediate transformation required to sanitize their own inputs is asking for trouble.
Currently, Metron would best be described as a kappa architecture. That being said, as Metron grows, we will want to execute the streamed enrichments in batch. Furthermore, it turns out enrichments are addictive. As we add more capabilities and subsystems, such as bulk import into HBase, it became quickly apparent that we might want to transform and enrich data as it goes into our enrichment store!
What is or is not simple is dependent very much upon the beholder. I have found that role and exposure builds up a cross-hatched, complex tapestry of experiences that distinguish between “obvious”, “non-obvious” and everything in between. Therefore, it is important to consider your audience when making a decision about how to provide such a complex bit of functionality to our users.
If you are a software engineer, at this point you may be screaming at the screen that we should just use a programming language. If you are a data analyst, you are likely screaming that the obvious choice is SQL. Metron is a system for the security analyst and they are neither of these roles exactly. Indeed, we needed something not quite as complex and something slightly different from SQL. On the other side, though, it was clear that embedding a general purpose programming language, that wasn’t quite the right fit either. We found that, by far and wide, we needed something closer to single line transformations that you could compose.
Considering existing solutions in the wild, I’d say the most relevant and strong motivating example is Microsoft Excel functions. Excel gives you the ability to compose simple functions to transform the values of cells based on the context of a spreadsheet.
Using the constraints and motivations we constructed Stellar as a scripting environment to have the following capabilities. A more complete discussion can be found here, but the highlights include:
Some aspects of Stellar are like programming environments (e.g. the REPL) and and some are very much not. It’s worth considering the limitations that we have chosen to include for simplicity:
Let’s consider a situation where we have a message with field ip_src_addr and we want to determine if the src address is one of a few subnet ranges and we want to store that in a variable called is_local:
is_local := IN_SUBNET( ip_src_addr, '192.168.0.0/16', '18.104.22.168/16')
Now, let’s consider a situation where we want to determine if the top level domain of a domain name, stored in a field called domain, is from a specific set of whitelisted TLDs:
is_government := DOMAIN_TO_TLD(domain) in [ 'mil', 'gov' ]
Let’s assume further that the data coming in is known to be spotty with possible spaces and a dot at the end periodically due to a known upstream data ingest mistake. We can do that with 3 Stellar statements, the first two sanitizing the domain field and the final doing the whitelist check:
sanitized_domain := TRIM(domain) sanitized_domain := if ENDS_WITH(sanitized_domain, '.') then CHOP(sanitized_domain) else sanitized_domain is_government := DOMAIN_TO_TLD( sanitized_domain ) in [ 'mil', 'gov' ]
Now, let’s consider a situation where we have a blacklist of known malicious domains. We have used the Metron data importer (also here) to store this data in HBase under the enrichment type ‘malicious_domains’. As data streams by, we’ll want to indicate whether a domain is malicious or not. Further, as before, we still have some ingestion cruft to adjust:
sanitized_domain := TRIM(domain) sanitized_domain := if ENDS_WITH(sanitized_domain, '.') then CHOP(sanitized_domain) else sanitized_domain in_blacklist := ENRICHMENT_EXISTS('malicious_domains', sanitized_domains, 'enrichments', 't')
Within metron, we use Stellar every places that we foresee the need of some degree of user modification or transformation integrate with Stellar directly. Specifically you can use Stellar to:
Further, for those capabilities that involve enriching or transforming data in the stream, Stellar statements are stored in zookeeper and thus require no restart of topologies to uptake the new enrichment or new field transformation, just an update of zookeeper from either the web interface, the CLI zookeeper config management tools or via the REPL.
Stellar is the prime mechanism that we use for interacting with various subsystems of Metron.
As you can see, Stellar provides the glue that gives a consistent user experience for interacting with the various subsystems of Metron.
Within Metron, we strive to enable as many of the use-cases as we can possibly foresee by the default sets of functions, but we understand that we will not be able to anticipate every edge case of enrichment or transformation function. Thus, we want to make it extremely simple to add new functions for specific needs. You can find complete instructions here, but the general approach is to implement your Stellar function in java by
On the whole, I believe that Stellar has scratched an important itch within Metron. It provides a consistent glue to fit together the various subsystems of Metron which provide their own unique capabilities into a whole solution. Walking the tight-rope of providing power and also not overwhelming possibly non-technical users with complexity is an interesting one.
I think that if we had adopted a general purpose programming language, the experience would have been needlessly complex for the tasks that our users need. On the other end, with hard-coded enrichments and transformations, we’d be in a perpetual arms race to provide more and more esoteric enrichments as part of the main project. Creating a simple language that we control ensures that we can focus on the capabilities that are most generally useful while also That is not to say that we do not have more ahead. One downside of having an adaptable language like Stellar is that it can be challenging to provide a useful graphical user interface abstraction. I have high hopes that we will adopt a solution similar to Blockly to make the creation of Stellar statements even more visual and less scary.
We are at the beginning of the path of this technology and, frankly, I like how the road ahead looks. If you want to know more about Stellar such as the language capabilities or the core functions, then you can find all of that detailed in the Metron documentation.
To watch Casey’s presentation from DataWorks Summit click here: controlling the complexity dramatically.