Last week, we hosted a webinar: Combating Phishing Attacks: How Big Data Helps Detect Impersonators where our audience confirmed that is really can take months, or even a year to investigate the repercussions of a breach such as a phishing attack. Due to the complex and dynamic nature of modern attack vectors, we discussed how much effort is involved in assessing the risk and damage that hackers can inflict upon enterprises today. More info on how to leverage big data and machine learning to detect hackers and impersonators in this blog.
During the webinar, we also covered the recently announced top level Apache project – Apache Metron. Apache Metron is an open source big data cyber security analytics platform supporting real time ingest and analytics to discover information security threats and build out a high value security data lake. Apache Metron helps security operations teams be more efficient by reducing the amount of “DIY” big data and data science tooling necessary to detect threats in real time.
There was plenty of discussion so we’ve done our best to answer the questions below. If you have more questions, anytime, we encourage you to check out the Cybersecurity track of Hortonworks Community Connection where an entire community of folks are monitoring and responding to questions. For those who may have missed the session you can check out the on-demand webinar and slideshare.
Handling encrypted streams can be a tricky, though not impossible problem. If your network has for example SSL interception, based on re-encryption using keys under your control, then you can do things like content inspection on encrypted mail. Of course if you have the keys used to encrypt, the decryption can be handled through Apache NiFi processors on the ingest path for instance.
GPS is a very useful complement to traditional network and GeoIP data. This is certainly the sort of telemetry that would work well with the streaming enrichment capability to provide context to, for example NetFlow, Proxy or other application log telemetry.
There are enrichment loaders for a wide range of data sources and the ability to transform inbound enrichments with a simple DSL called Stellar. We also have native support for GeoIP enrichment using the MaxMind binary API for speed.
We provide a parser for Bro data, and a plugin for Bro to post data directly to Kafka for high-performance Bro ingest.
Absolutely, Hortonworks can provide services and support for Metron and the underlying platforms.
Metron provides a high speed route to load PCAP data into Sequence files in HDFS, to ensure split-ability and large scale processing. It then provides a range of means to query and process the raw PCAPs. Metron provides jobs to query PCAP by basic headers as well as mechanisms to do deep pattern searches over large scale PCAP.
The core Metron engine is built on Storm to provide low latency real-time task parallelism. Spark excels at data-parallel tasks. Metron also makes extensive use of Spark for building machine learning models with a variety of libraries. Many of the models have been built using pyspark for example.
Much of the install and management is simplified by Ambari through MPacks which are used to install the Metron application. We are constantly working to provide simpler means to operate the platform. The SmartSense offering from Hortonworks also provides advice and tuning suggestions based on real use of the cluster, making operations easier.
Metron provides a number of visualisation options out of the box including kibana dashboards, and some example Zeppelin dashboards for typical use cases around data sources like NetFlow
We currently have some roadmap items around integration of graph databases and an ontology mapper.
Right now (April 2017) we have a management UI within the platform which provides access to configure parsers, enrichments and transformations. The management UI also provides an interface to edit triage rules and tune scores to prioritise the output for analysts. That output is mainly in the form of Kibana and Zeppelin dashboards at present. A complete investigator focused UI experience is in the works.
Hortonworks and Apache Software Foundation are completely separate entities. Hortonworks provides distributions of code developed in Apache projects. We also provide support subscriptions and services around those distributions, something we can do by employing many of the Apache committers on the projects. We commit our contributions back to the Apache Open Source.
We currently provide deployment mechanisms which deploy Metron directly into AWS and Azure as part of the solution offering. Many of our customers choose to run Metron in the cloud. However, our solution is cloud agnostic, so we do not tightly couple it to solutions only available in one cloud, but work across all vendor clouds.
Metron is installed and managed through Ambari MPacks. The underlying components and metrics are all monitored through Ambari Metrics. We are working to publish more Metron specific metrics to Ambari as well.
Metron is usually installed on the HDP platform, which has full support for Zeppelin. In fact, we use Zeppelin to produce a number of dashboards. Metron also uses Zeppelin to provide active runbooks for SOC staff, which use the notebook approach to integrate process documentation and live data with visualisation to guide analysts through the investigation process.
The model as a service component allows you to write and extend models in any language that will run in a YARN container, and provide a REST interface. What this means is that languages such as Python and R are an excellent fit to run models, either directly on their own, or for instance through Spark.
The webinar is available on demand here: https://hortonworks.com/webinar/combating-phishing-attacks-big-data-helps-detect-impersonators/