Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics, offering information and knowledge of the Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
May 04, 2017
prev slideNext slide

20 Questions with Big Data Cybersecurity Experts on Apache Metron: Webinar Recap

Last week, we hosted a webinar: Combating Phishing Attacks: How Big Data Helps Detect Impersonators  where our audience confirmed that is really can take months, or even a year to investigate the repercussions of a breach such as a phishing attack. Due to the complex and dynamic nature of modern attack vectors, we discussed how much effort is involved in assessing the risk and damage that hackers can inflict upon enterprises today. More info on how to leverage big data and machine learning to detect hackers and impersonators in this blog.

 

During the webinar,  we also covered the recently announced top level Apache project – Apache Metron. Apache Metron is an open source big data cyber security analytics platform supporting real time ingest and analytics to discover information security threats and build out a high value security data lake. Apache Metron helps security operations teams be more efficient by reducing the amount of “DIY” big data and data science tooling necessary to detect threats in real time.

Apache Metron big data cybersecurity platform

There was plenty of discussion so we’ve done our best to answer the questions below. If you have more questions, anytime, we encourage you to check out the Cybersecurity track of Hortonworks Community Connection where an entire community of folks are monitoring and responding to questions. For those who may have missed the session you can check out the on-demand webinar  and slideshare.

Questions

  • Can Apache Metron take in log data from applications (finance apps) and figure out anomalies too?

Absolutely.

  • How about encrypted attachments, typically stock statements etc? Do these get handled too?

Handling encrypted streams can be a tricky, though not impossible problem. If your network has for example SSL interception, based on re-encryption using keys under your control, then you can do things like content inspection on encrypted mail. Of course if you have the keys used to encrypt, the decryption can be handled through Apache NiFi processors on the ingest path for instance.

  • In a case of telcos, where GPS co-ordinates are also important, can Metron ingest such type of data sources directly and provide us analytics in terms of geography also?

GPS is a very useful complement to traditional network and GeoIP data. This is certainly the sort of telemetry that would work well with the streaming enrichment capability to provide context to, for example NetFlow, Proxy or other application log telemetry.

  • Do you have an API or standardized input/query method to integrate other enrichment sources such as geolocation databases.

There are enrichment loaders for a wide range of data sources and the ability to transform inbound enrichments with a simple DSL called Stellar. We also have native support for GeoIP enrichment using the MaxMind binary API for speed.

  • Do you support other input sources like Bro or Pyshark?

We provide a parser for Bro data, and a plugin for Bro to post data directly to Kafka for high-performance Bro ingest.

  • Does Hortonworks provide professional services to help deploy and manage Metron on an ongoing basis?

Absolutely, Hortonworks can provide services and support for Metron and the underlying platforms.

  • Can you provide more detail regarding how to search for PCAP data and how Metron archives the information?

Metron provides a high speed route to load PCAP data into Sequence files in HDFS, to ensure split-ability and large scale processing. It then provides a range of means to query and process the raw PCAPs. Metron provides jobs to query PCAP by basic headers as well as mechanisms to do deep pattern searches over large scale PCAP.

  • Is the Metron Engine based on SPARK for real time processing? Can it apply Spark SQL and MLLIB for Machine Learning? Is Python supported for customization?

The core Metron engine is built on Storm to provide low latency real-time task parallelism. Spark excels at data-parallel tasks. Metron also makes extensive use of Spark for building machine learning models with a variety of libraries. Many of the models have been built using pyspark for example.

  • What were the companies cited  in the articles about leveraging open source again?

Telstra  and Capital One.  (Related article here)

  • What plans do you have to reduce complexity of deployment and long term support of a full blown Hadoop stack?

Much of the install and management is simplified by Ambari through MPacks which are used to install the Metron application. We are constantly working to provide simpler means to operate the platform. The SmartSense offering from Hortonworks also provides advice and tuning suggestions based on real use of the cluster, making operations easier.

  • For the presentation layer do we again need Tableau type of software ?

Metron provides a number of visualisation options out of the box including kibana dashboards, and some example Zeppelin dashboards for typical use cases around data sources like NetFlow

  • Do you support integration with Graph databases or the Spark built in graph functionality? I think Graph database and Graph analysis can help for sporadic emails and analysis of dense sources and destinations.

We currently have some roadmap items around integration of graph databases and an ontology mapper.

  • Any input on Metron UI availability?

Right now (April 2017) we have a management UI within the platform which provides access to configure parsers, enrichments and transformations. The management UI also provides an interface to edit triage rules and tune scores to prioritise the output for analysts. That output is mainly in the form of Kibana and Zeppelin dashboards at present. A complete investigator focused UI experience is in the works.

  • How are Apache and Hortonworks related and integrated from a business standpoint? I am always confused with Apache’s huge set of offerings in the Big Data front. When I think Apache as a open source governance enterprise. Please correct my misunderstandings?

Hortonworks and Apache Software Foundation are completely separate entities. Hortonworks provides distributions of code developed in Apache projects. We also provide support subscriptions and services around those distributions, something we can do by employing many of the Apache committers on the projects. We commit our contributions back to the Apache Open Source.

  • Where do I find more info about Apache Metron?

There is a brief overview here.  You can also join the community at metron.apache.org

  • Can we use AWS bigdata and machine learning (SparkM, Amazon Machine learning) solutions to speed up the setup of Metron in the cloud?

We currently provide deployment mechanisms which deploy Metron directly into AWS and Azure as part of the solution offering. Many of our customers choose to run Metron in the cloud. However, our solution is cloud agnostic, so we do not tightly couple it to solutions only available in one cloud, but work across all vendor clouds.

  • Can Metron components be monitored using Ambari?

Metron is installed and managed through Ambari MPacks. The underlying components and metrics are all monitored through Ambari Metrics. We are working to publish more Metron specific metrics to Ambari as well.

  • Does Metron support Zeppelin?

Metron is usually installed on the HDP platform, which has full support for Zeppelin. In fact, we use Zeppelin to produce a number of dashboards. Metron also uses Zeppelin to provide active runbooks for SOC staff, which use the notebook approach to integrate process documentation and live data with visualisation to guide analysts through the investigation process.

  • For any extensions to the model, does it also allow Python / R?

The model as a service component allows you to write and extend models in any language that will run in a YARN container, and provide a REST interface. What this means is that languages such as Python and R are an excellent fit to run models, either directly on their own, or for instance through Spark.

  • Where can I find the recording of the webinar?

The webinar is available on demand here: https://hortonworks.com/webinar/combating-phishing-attacks-big-data-helps-detect-impersonators/

Comments

  • It is really a great and useful article Anna. I’m glad that you shared this helpful info with us. Please keep us informed like this. Thanks a lot for sharing.

  • Thank you for bringing more information to this topic for me. I’m truly grateful and really impressed. Absolutely this article is incredible.

  • Leave a Reply

    Your email address will not be published. Required fields are marked *