Posts by Michael Baker:


Big Data Security Part Three: PacketPig Finding Zero Day Attacks

Introduction

This is part three of a Big Data Security blog series. You can read the previous two posts here: Part One / Part Two.

When Russell Jurney and I first teamed up to write these posts we wanted to do something that no one had done before to demonstrate the power of Big Data, the simplicity of Pig and the kind of Big Data Security Analytics we perform at Packetloop. Packetpig was modified to support Amazon’s Elastic Map Reduce (EMR) so that we could process a 600GB set of full packet captures. All that we needed was a canonical Zero Day attack to analyse. We were in luck!

In August 2012 a vulnerability in Oracle JRE 1.7 created huge publicity when it was disclosed that a number of Zero Day attacks had been report to Oracle in April but had still not been addressed in late August 2012. To make matters worse Oracle’s scheduled patch for JRE was months away (October 16). This position subsequently changed and a number of out-of-band patches for JRE were released for what became known as CVE-2012-4681 on the 30th of August.

The vulnerability exposed around 1 Billion systems to exploitation and the exploit was 100% effective on Windows, Mac OSX and Linux. A number of security researchers were already seeing the exploit in the wild as it was incorporated into exploit packs for the delivery of malware.

What is a Zero Day?

Put simply it’s any vulnerability that can be exploited without an available mitigation. The mitigation most people measure Zero Days by is a patch from the software vendor (in this case Oracle).

If we look at the timeline of this exploit you can see how long it was Zero Day for;

  • The Bug was introduced to JRE on July 28th 2011.
  • It was Disclosed to the public on April 2nd 2012.
  • The Exploit was available in the Metasploit Framework on August 26th 2012. With other PoC’s publicly available around the same time.
  • Detection was available via Snort IDS/IPS on August 28th 2012.
  • Lastly a Patch was available from Oracle on 30th August 2012.

If you compare the date the Bug was introduced and the date of the Patch the Zero Day time is 399 days. Comparing the date of Disclosure with the Patch date is still a staggering 150 days. To put this in perspective, a software bug that affects around 1 Billion devices was able to be exploited for well over a year and certainly was being seen in the wild. Whether you take the view that the Zero Day period is around 150 days (from disclosure)  or over a year (from introduction) both are extremely scary.

So how can you tell whether you were exploited using this JRE bug in the last 6 months or year? How can you prove your network or important systems haven’t been exploited using this vulnerability?

Finding Zero Day attacks

Packetpig provides you with the ability to search vast amounts of network packet captures for Zero Day attacks. To demonstrate this I executed the Metasploit Exploit for the JRE bug against a Windows XP workstation and recorded the packet capture. I then went and hid this 500KB capture amongst 600GB of Full Packet Captures from a system we monitor on the Internet. Every packet is captured to an S3 bucket so we can quickly scan the S3 bucket for Zero Days using Amazon’s Elastic Map Reduce.

So for the purpose of this demonstration as soon as the Snort Signatures were updated on the 28th of August I downloaded them. This allowed me to scan the 600GB of packet captures with the old signatures (in this case 2905) and then again with the new signatures (in this case 2931).

Let’s run through the Packetpig job ‘snort_comparison.pig‘ to see how this was done. The key to understanding the job is that we use the Packetpig SnortLoader() to scan the network packet captures with the old signatures and again with the new signatures. Anything in the old signature scan is removed from the new signature scan leaving only the Zero Day attacks.

In the same way as our last post we setup a number of variables using an include.pig file. After that we define old_snort_conf and new_snort_conf;

%DEFAULT includepath pig/include.pig
RUN $includepath;
 
%DEFAULT time 60
 
-- for local mode: uncomment the next line and comment the one after that
--%DEFAULT old_snort_conf 'lib/snort-2905/etc/snort.conf'
%DEFAULT old_snort_conf '/mnt/var/lib/snort-2905/etc/snort.conf'
 
-- for local mode: uncomment the next line and comment the one after that
--%DEFAULT new_snort_conf 'lib/snort-2931/etc/snort.conf'
%DEFAULT new_snort_conf '/mnt/var/lib/snort-2931/etc/snort.conf'

The SnortLoader() is used with the old snort.conf and the new snort.conf to scan the packet captures;

snort_old_alerts =
    LOAD '$pcap'
    USING com.packetloop.packetpig.loaders.pcap.detection.SnortLoader('$old_snort_conf')
    AS (
        ts:long,
        sig:chararray,
        priority:int,
        message:chararray,
        proto:chararray,
        src:chararray,
        sport:int,
        dst:chararray,
        dport:int
);
 
snort_new_alerts =
    LOAD '$pcap'
    USING com.packetloop.packetpig.loaders.pcap.detection.SnortLoader('$new_snort_conf')
    AS (
        ts:long,
        sig:chararray,
        priority:int,
        message:chararray,
        proto:chararray,
        src:chararray,
        sport:int,
        dst:chararray,
        dport:int
);
Next we group (COGROUP) the old and the new Snort scans and we filter out any signatures that appear in both;

snort_joined = COGROUP snort_old_alerts BY sig, snort_new_alerts BY sig;
new_only_filtered = FILTER snort_joined BY (COUNT(snort_old_alerts) == 0);

Lastly we re-project the data and then store it. The snort_comparison_new/part-r-00000 file is a verbose version of snort_comparison/summary/part-r-00000.

new_only_flattened = FOREACH new_only_filtered GENERATE FLATTEN(snort_new_alerts);
new_only_summary = FOREACH new_only_filtered GENERATE group, COUNT(snort_new_alerts);
 
STORE new_only_flattened INTO '$output/snort_comparison_new';
STORE new_only_summary INTO '$output/snort_comparison_summary';

To demonstrate this in practice I test the job on a small number of packet captures on my local development laptop. Watch the video to see how to do it.

Next I take it to the cloud and use 80 x m2.4large instances to process 600GB of full packet captures to find the Oracle JRE 1.7 attack. The 80 nodes spin up, install all the Packetpig software (bootstrap) and then go to work crunching the network packet captures. Check out the video to see the full process.

Big Data Security Part Two: Introduction to PacketPig

Introduction

Packetpig is the tool behind Packetloop. In Part One of the Introduction to Packetpig I discussed the background and motivation behind the Packetpig project and problems Big Data Security Analytics can solve. In this post I want to focus on the code and teach you how to use our building blocks to start writing your own jobs.

The ‘building blocks’ are the Packetpig custom loaders that allow you to access specific information in packet captures. There are a number of them but two I will focus in this post are;

  • Packetloader() allows you to access protocol information (Layer-3 and Layer-4) from packet captures.
  • SnortLoader() inspects traffic using Snort Intrusion Detection software.

Calculating Bandwidth and Binning Time

The Packetloader() provides access to IP, TCP and UDP headers for each packet in the capture. A great example of it’s use is the ‘binning.pig‘ script. This script allows you to calculate the bandwidth used by TCP and UDP packets as well as total bandwidth at any period you define. You might want to calculate these totals every minute, hour, day, week or month to produce a graph.

Firstly run the binning script using the following command.

./pigrun.py -x local -r data/web.pcap -f pig/examples/binning.pig

Then open up output/binning/part-r-00000 in a text editor to see the output.

Now let’s walk through the script. Firstly let’s include all the jar’s required for Packetpig and binning.pig to run;

%DEFAULT includepath pig/include.pig
RUN $includepath;
Then the amount of time you want to bin your values into. In this case I want to output the values every minute (60 seconds) but I could easily change this to an hour (3600 seconds) by commenting and uncommenting the following lines;
%DEFAULT time 60
--%DEFAULT time 3600
Then I load the data out of the packet captures into quite a large schema using the Packetloader();
packets = load '$pcap' using com.packetloop.packetpig.loaders.pcap.packet.PacketLoader() AS (
    ts,
    ip_version:int,
    ip_header_length:int,
    ip_tos:int,
    ip_total_length:int,
    ip_id:int,
    ip_flags:int,
    ip_frag_offset:int,
    ip_ttl:int,
    ip_proto:int,
    ip_checksum:int,
    ip_src:chararray,
    ip_dst:chararray,
    tcp_sport:int,
    tcp_dport:int,
    tcp_seq_id:long,
    tcp_ack_id:long,
    tcp_offset:int,
    tcp_ns:int,
    tcp_cwr:int,
    tcp_ece:int,
    tcp_urg:int,
    tcp_ack:int,
    tcp_psh:int,
    tcp_rst:int,
    tcp_syn:int,
    tcp_fin:int,
    tcp_window:int,
    tcp_len:int,
    udp_sport:int,
    udp_dport:int,
    udp_len:int,
    udp_checksum:chararray
);

This is a very rich data model and through leveraging the timestamp (ts), size of the IP packet (ip_total_length), and size of the TCP (tcp_len) and UDP (udp_len) we can calculate total and respective bandwidths at any interval.  The beauty of pig is that I could easily hone in on specific hosts by grouping on the Source IP, Destination IP and Destination Port – but let’s keep things simple in this post.

The ip_proto field allows be to filter all packets based on protocol. TCP is IP protocol 6 and UDP is IP protocol 17.

tcp = FILTER packets BY ip_proto == 6;
udp = FILTER packets BY ip_proto == 17;

Once filtered we can bin each packet into a time period and then project a summary of the data with the size of all TCP packets in that time period (bin) summed.

tcp_grouped = GROUP tcp BY (ts / $time * $time);
tcp_summary = FOREACH tcp_grouped GENERATE group, SUM(tcp.tcp_len) AS tcp_len;

And then the same for UDP.

udp_grouped = GROUP udp BY (ts / $time * $time);
udp_summary = FOREACH udp_grouped GENERATE group, SUM(udp.udp_len) AS udp_len;

To get calculate total bandwidth of all IP packets we bin all packets using the same time period and then sum ip_total_length.

bw_grouped = GROUP packets BY (ts / $time * $time);
bw_summary = FOREACH bw_grouped GENERATE group, SUM(packets.ip_total_length) AS bw;
The output we were looking for is basically comma separated values for timestamp, tcp bandwidth, udp bandwidth and total bandwidth. This is produced by a final join and projection.
joined = JOIN tcp_summary BY group, udp_summary BY group, bw_summary BY group;
summary = FOREACH joined GENERATE tcp_summary::group, tcp_len, udp_len, bw;

It may seem a little cryptic but basically the JOIN statement is joining using the group that all the summaries share which is the time period. If you ILLUSTRATE the joined variable you will see the data is there but not in the format we are looking for.

| joined | tcp_summary::group:int | tcp_summary::tcp_len:long | udp_summary::group:int | udp_summary::udp_len:long | bw_summary::group:int | bw_summary::bw:long |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| | 1322644980 | 2080 | 1322644980 | 81 | 1322644980 | 2305 |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

However the summary projection generates the output the way we want it and we store that in a CSV format using PigStorage(‘,’).

STORE summary INTO '$output/binning' USING PigStorage(',');

Threat Detection

The SnortLoader() can be used to replay all conversations through Snort IDS and output attacks that it finds. The SnortLoader() can also take a snort.conf as a parameter so you can scan packet captures with specific Snort versions.

Run the basic snort.pig script to get an idea of the output.

./pigrun.py -x local -r data/web.pcap -f pig/examples/snort.pig

Now let’s run through the snort.pig script. Again we include all the jar’s we need for Packetpig.

%DEFAULT includepath pig/include.pig
RUN $includepath;

The script is constructed so that you can pass parameters to either scan all traffic for attacks or zero in on specific source and destination IP addresses. By leaving most of these null we inspect all traffic. Also note we are again binning time every 60 seconds. Lastly the Packetpig includes a number of versions of Snort. The default snort.conf we include ensures you use the latest one.

%DEFAULT time 60
%DEFAULT src null
%DEFAULT dst null
%DEFAULT sport null
%DEFAULT dport null
%DEFAULT snortconfig 'lib/snort/etc/snort.conf'

The SnortLoader() receives the snortconfig paramter and starts inspection the packet capture for attacks and provides them back to you in defined schema.

snort_alerts =
  LOAD '$pcap'
  USING com.packetloop.packetpig.loaders.pcap.detection.SnortLoader('$snortconfig')
  AS (
    ts:long,
    sig:chararray,
    priority:int,
    message:chararray,
    proto:chararray,
    src:chararray,
    sport:int,
    dst:chararray,
    dport:int
  );

Using this schema you can access the timestamp (ts), Snort Signature ID (sig), Severity/Priority (priority), Description of the attack (message) and the Source (src), Source Port (sport), Destination (dst) and Destination port (dport) of the attack.

If you ran the script and opened up output/snort/part-m-00000 you will see a number of attacks matching the schema output of the SnortLoader(). One thing to note is Snort using Priority 1 for the highest severity, Priority 2 for the next highest etc.

1322645240 120_3 3 (http_inspect) NO CONTENT-LENGTH OR TRANSFER-ENCODING IN HTTP RESPONSE TCP 184.84.221.18 80 192.168.0.19 34299
1322645387 139_1 2 (spp_sdf) SDF Combination Alert DIVERT 184.84.221.18 0 192.168.0.19 0
1322645603 120_3 3 (http_inspect) NO CONTENT-LENGTH OR TRANSFER-ENCODING IN HTTP RESPONSE TCP 74.125.237.27 80 192.168.0.19 41791
1322645907 120_3 3 (http_inspect) NO CONTENT-LENGTH OR TRANSFER-ENCODING IN HTTP RESPONSE TCP 199.181.254.21 80 192.168.0.19 54222
1322645689 120_3 3 (http_inspect) NO CONTENT-LENGTH OR TRANSFER-ENCODING IN HTTP RESPONSE TCP 74.125.237.123 80 192.168.0.19 42514
1322645739 138_5 2 SENSITIVE-DATA Email Addresses TCP 74.125.237.123 80 192.168.0.19 42514

The snort.pig script is our most basic example but hopefully you are already thinking about what you could filter on (e.g. Severity) as well as re projecting the data you access out of SnortLoader() to find the top ten attackers and top ten victims.

In my next post I will show you how to find Zero Day attacks in past network packet captures.

Big Data Security Part One: Introducing PacketPig

Series Introduction

Packetloop CTO Michael Baker (@cloudjunky) made a big splash when he presented ‘Finding Needles in Haystacks (the Size of Countries)‘ at Blackhat Europe earlier this year. The paper outlines a toolkit based on Apache Pig, Packetpig @packetpig (available on github), for doing network security monitoring and intrusion detection analysis on full packet captures using Hadoop.

In this series of posts, we’re going to introduce Big Data Security and explore using Packetpig on real full packet captures to understand and analyze networks. In this post, Michael will introduce big data security in the form of full data capture, Packetpig and Packetloop.

Introducing Packetpig

Intrusion detection is the analysis of network traffic to detect intruders on your network. Most intrusion detection systems (IDS) look for signatures of known attacks and identify them in real-time. Packetpig is different. Packetpig analyzes full packet captures – that is, logs of every single packet sent across your network – after the fact. In contrast to existing IDS systems, this means that using Hadoop on full packet captures, Packetpig can detect ‘zero day’ or unknown exploits on historical data as new exploits are discovered. Which is to say that Packetpig can determine whether intruders are already in your network, for how long, and what they’ve stolen or abused.

Packetpig is a Network Security Monitoring (NSM) Toolset where the ‘Big Data’ is full packet captures. Like a Tivo for your network, through its integration with Snort, p0f and custom java loaders, Packetpig does deep packet inspection, file extraction, feature extraction, operating system detection, and other deep network analysis. Packetpig’s analysis of full packet captures focuses on providing as much context as possible to the analyst. Context they have never had before. This is a ‘Big Data’ opportunity.

Full Packet Capture: A Big Data Opportunity

What makes full packet capture possible is cheap storage – the driving factor behind ‘big data.’ A standard 100Mbps internet connection can be cheaply logged for months with a 3TB disk. Apache Hadoop is optimized around cheap storage and data locality: putting spindles next to processor cores. And so what better way to analyze full packet captures than with Apache Pig – a dataflow scripting interface on top of Hadoop.

In the enterprise today, there is no single location or system to provide a comprehensive view of a network in terms of threats, sessions, protocols and files. This information is generally distributed across domain-specific systems such as IDS Correlation Engines and data stores, Netflow repositories, Bandwidth optimisation systems or Data Loss Prevention tools. Security Information and Event Monitoring systems offer to consolidate this information but they operate on logs – a digest or snippet of the original information. They don’t provide full fidelity information that can be queried using the exact copy of the original incident.

Packet captures are a standard binary format for storing network data. They are cheap to perform and the data can be stored in the cloud or on low-cost disk in the Enterprise network. The length of retention can be based on the amount of data flowing through the network each day and the window of time you want to be able to peer into the past.

Pig, Packetpig and Open Source Tools

In developing Packetpig, Packetloop wanted to provide free tools for the analysis network packet captures that spanned weeks, months or even years. The simple questions of capture and storage of network data had been solved but no one had addressed the fundamental problem of analysis. Packetpig utilizes the Hadoop stack for analysis, which solves this problem.

For us, wrapping Snort and p0f was a bit of a homage to how much security professionals value and rely on open source tools. We felt that if we didn’t offer an open source way of analysing full packet captures we had missed a real opportunity to pioneer in this area. We wanted it to be simple, turn key and easy for people to take our work and expand on it. This is why Apache Pig was selected for the project.

Understanding your Network

One of the first data sets we were given to analyse was a 3TB data set from a customer. It was every packet in and out of their 100Mbps internet connection for 6 weeks. It contained approximately 500,000 attacks. Making sense of this volume of information is incredibly difficult with current tooling. Even Network Security Monitoring (NSM) tools have difficult with this size of data. However it’s not just size and scale. No existing toolset allowed you to provide the same level of context. Packetpig allows you to join together information related to threats, sessions, protocols (deep packet inspection) and files as well as Geolocation and Operating system detection information.

We are currently logging all packets for a website for six months. This data set is currently around 0.6TB and because all the packet captures are stored in S3 we can quickly scan through the dataset. More importantly, we can run a job every nightly or every 15 minutes to correlate attack information with other data from Packetpig to provide an ultimate amount of context related to security events.

Items of interest include:

  • Detecting anomalies and intrusion signatures
  • Learn timeframe and identity of attacker
  • Triage incidents
  • “Show me packet captures I’ve never seen before.”

“Never before seen” is a powerful filter and isn’t limited to attack information. First introduced by Marcus Ranum, “never before seen” can be used to rule out normal network behaviour and only show sources, attacks, and traffic flows that are truly anomalous. For example, think in terms of the outbound communications from a Web Server. What attacks, clients and outbound communications are new or have never been seen before? In an instant you get an understanding that you don’t need to look for the normal, you are straight away looking for the abnormal or signs of misuse.

Agile Data

Packetloop uses the stack and iterative prototyping techniques outlined in the forthcoming book by Hortonworks’ own Russell Jurney, Agile Data (O’Reilly, March 2013). We use Hadoop, Pig, Mongo and Cassandra to explore datasets and help us encode important information into d3 visualisations. Currently we use all of these tools to aid in our research before we add functionality to Packetloop. These prototypes become the palette our product is built from.