Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Sign up for the Developers Newsletter

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.


Get Started


Ready to Get Started?

Download sandbox

How can we help you?

* I understand I can unsubscribe at any time. I also acknowledge the additional information found in Hortonworks Privacy Policy.
closeClose button
September 19, 2018
prev slideNext slide

Q&A to Demystify the Power of Apache Metron in Real-Time Threat Detection

How is Apache Metron utilized in Hortonworks’ product portfolio?

Hortonworks Cybersecurity Platform (HCP) is powered by Apache Metron and other open-source big data technologies. At the prime intersection of Big Data and Machine Learning, HCP employs a data-science-based approach to visualize diverse, streaming security data at scale to aid Security Operations Centers (SOC) in real-time detection and response to threats. This open source platform is built on top of the unmatched scalability and governance of data in Hortonworks Data Platform (HDP) and the real-time ingest and processing capability in Hortonworks DataFlow (HDF). Core features³ of HCP include:

  • Ingest and data enrichment in real-time of security data sources at millions of events per second.
  • Real-time behavior profiling at scale.
  • Petabyte-scale storage platform allows larger training sets and detailed forensic replay when a cyber threat is detected.
  • Rapid productionization of machine learning, allowing data scientists to work in real-time and monitor environments faster.
  • “Single view of risk” user interfaces make SOC analysts more productive, and dashboard and notebook interfaces make data scientists more effective.

Please describe the modules and functionality of Apache Metron

Apache Metron is a real-time security solution focused heavily on streaming data sources and fast data processing. It consists of modules of parsing, normalising and enriching data with internal and third-party threat intelligence including STIX feeds. The smart modules includes the behaviour profiler which provides a number of algorithms for modeling typical behaviour and anomaly detection, and Model as a Service which allows machine learning models to plugged directly into the real-time pipelines.

There are also user interface and presentation modules, focusing of different users, including a front line alert triage dashboard, and highly flexible investigation notebook interface, which allows experienced users to deploy the full power of components in the big data stack, like Apache Spark for everything from large scale SQL to advances machine learning.

How does Apache Metron ensure real-time protection? What makes Metron so safe?

The Metron architecture is based on a real-time streaming platform but abstracts that platform from the end user with a simple extensible configuration language. The Metron project focuses significant effort on optimizing the streaming pipeline as much as possible. We also rely on Apache Kafka for resilience of input, output and intermediate staging, which ensures effective buffering and prevents data loss from equipment failure.

Another key element is Metron’s ability to push configuration changes to the pipeline in real-time, so we don’t need to restart streaming applications to change behaviours. This makes it easy for operators to deal with changes in things like thresholds and alerting rules, without compromising the throughput, which can be a real help in a DDoS emergency attack situation.

How does the provisioning and scaling of applications with Metron work?

Metron is an application built on top of the Hortonworks Data Platform and Hortonworks Data Flow. The application itself uses many of the highly scalable components of the platforms. Metron can also be deployed on a cloud based platform using Cloudbreak to allow for rapid scaling to meet changing demand. This can be particularly important in the case of high volume attacks, or environments with a very cyclical day, with higher capacity and demand for real-time processing during office hours.

All the provisioning and configuration for Metron is handled via Apache Ambari which provides a single interface of cluster install, management, and configuration tool.

Which authentication methods does Metron use, how do different modules/users/systems recognize each other?

Metron uses Apache Knox to front authentication, and so has the ability to integrate with a wide range of single sign on and enterprise authentication methods, including platforms like Active Directory, Kerberos, and modern web authentication methods like OAuth.

The Metron infrastructure mainly uses a combination of managed config through Apache Ambari and Apache Zookeeper service discovery to do things like discover instances of Machine Learning based models spread across a cluster.

Which Threat Intelligence modules does Metron use? What does machine learning look like in this context?

Metron accepts threat intelligence from a variety of sources from simple flat file blacklists to STIX formatted indicators of compromise. The platform provides a high-performance engine to match threat intel against incoming data. Unsupervised machine learning algorithms can be particularly useful for this kind of automatic correlation of events to intelligence.

Machine learning can benefit from threat intel in a number of ways. Metron for example uses clustering and similarity techniques to find zero-day events which might look like other events threat intel managed to catch. The threat intelligence and enrichment feeds also create good features to boost the power of machine learning algorithms, as well as labelled examples of known bad events which can feed into supervised algorithms.

Which typical fields of application do you see for Metron?

Metron is purpose-built for cyber security at scale. The following are common use cases:

  • Managed Security Service Provider (MSSP): Metron’s high throughput ingestion pipeline and cost effective storage and distributed analytics are an ideal fit for MSSPs requiring a scalable solution to process and triage security event data from multiple customers.   
  • SIEM augmentation or migration:  Metron can replace a SIEM or work together with a SIEM.  Metron offloads data from a SIEM to increase retention and provide faster search or analytics.  Metron can preprocess or filter high throughput logs that are not feasible in the SIEM including PCAP, firewall, DNS, windows events, and audit logs.
  • Real-time automated responses:  Pairing real time data ingest with automated response orchestration enables organizations to minimize exposure and improve SOC efficiency.
  • Machine learning and threat hunting: Metron prepares historical context that is ready for training machine learning or threat hunting.
  • User Entity Behavior Analysis (UEBA): The Metron configurable profiler captures baseline behavior using efficient algorithms.  Profile retrieval supports time aggregations as well as seasonal trends offering many options for detecting anomalous behaviour.
  • Custom solutions:  Metron is ideal for organizations that want to go beyond out of the box solutions and incorporate custom dashboards, notebooks, triaging, and data storage to optimize their SOC processes.

What phase is Metron currently in? What are the next steps?

Like any open source project Metron is always growing. With production ready releases and deployments across the globe, it is definitely starting to grow into a strong platform. As this continues, and the community around the project continues to grow from strength to strength, we expect to see more complex use cases emerging, a sharing platform for behaviour profiles and machine learning models, and data structure emerging from real use-cases instead of hypothetical standards.

For which types of users is Metron particularly attractive? Why?

Metron’s strength is in sheer scale and performance. It is designed for medium to large enterprise use cases and teams with a SOC and Security Data Science capability. To date is has also appealed strongly to Managed Security Service Providers who run multi-tenant versions of the Metron platform, often bringing their own models, extensions and service expertise to the platform. This MSSP sector makes the platform far more accessible to the small and medium scale enterprise, who can also benefit from the a kind of herd immunity thanks to the massive scale of data and machine learning that these multi-tenant Metron platforms present.

How is information exchanged in the event of malware attacks? What mechanism does Metron use to obtain information?

Metron supports a number of means of sharing threat intelligence. The primary means is the industry standard STIX format, though numerous other sources can be supported with pluggable parsers and a broad range of ingest methods available in the underlying big data platform.

What role does the IoT play in the concept of Apache Metron?

IoT opens up a huge range of opportunities for businesses to make sense of the physical world. However, with the range of sensors, and extension of networks to broader environments, comes the unfortunate danger of a large attack surface. Many IoT devices are, by definition, low powered devices intended to run for extended periods on batteries with partial connectivity. They are slimmed down for speed and cost, so traditional endpoint agents and protection running on the devices just isn’t in the battery budget. Metron takes a more network centric approach, and in combination with intelligent edge collection tools Apache NiFi allows security people to tap into IoT networks without disrupting the devices and ensure the environment they operate in is managed and secure. Detection of the kind of botnets emerging from IoT environments and the spread of infections is also a key strength of Metron, allowing administrators to catch and contain infections before they become epidemics.

To learn more about how the partnership between Hortonworks and Zoomdata can be a game changer for your security operations, please register for the upcoming webinar on September 26, 2018:
To Learn More about Hortonworks Cybersecurity Platform powered by Apache Metron:


abraham says:

In Apache-metron logical architecture, there are modules which is processed by apache storm and i understand the concepts how apache storm normalize and parse log data which accepted from kafka, but i am not clear about how apache storm tag and validate ?

Simon Elliston Ball says:

Hi, and thanks for your question. Apache Metron consists of a series of DataFlow topologies, currently written on storm, which handle parsing, transformation, validation and tagging of metadata to produce a single message representing the combined context added to a log message. Validation and tranformation is very extensible and mostly implemented using a simple expression language to construct rules and transformations.

Nitin says:


Can the HDP Cybersecurity platform replace the SOC/ SIEM tools, does it can provide the rules, alerts and correlation?
Looking for a usecase, video of somethings in action, we are looking to build a nextgen SIEM/ SOC.

Let us know, if this can be a good replacement, and how the storage is handled.

Leave a Reply

Your email address will not be published. Required fields are marked *