The Packetloader() provides access to IP, TCP and UDP headers for each packet in the capture. A great example of it’s use is the ‘binning.pig‘ script. This script allows you to calculate the bandwidth used by TCP and UDP packets as well as total bandwidth at any period you define. You might want to calculate these totals every minute, hour, day, week or month to produce a graph.
Firstly run the binning script using the following command.
Then open up output/binning/part-r-00000 in a text editor to see the output.
Now let’s walk through the script. Firstly let’s include all the jar’s required for Packetpig and binning.pig to run;
Then the amount of time you want to bin your values into. In this case I want to output the values every minute (60 seconds) but I could easily change this to an hour (3600 seconds) by commenting and uncommenting the following lines;
Then I load the data out of the packet captures into quite a large schema using the Packetloader();
packets = load '$pcap' using com.packetloop.packetpig.loaders.pcap.packet.PacketLoader() AS (
ts,
ip_version:int,
ip_header_length:int,
ip_tos:int,
ip_total_length:int,
ip_id:int,
ip_flags:int,
ip_frag_offset:int,
ip_ttl:int,
ip_proto:int,
ip_checksum:int,
ip_src:chararray,
ip_dst:chararray,
tcp_sport:int,
tcp_dport:int,
tcp_seq_id:long,
tcp_ack_id:long,
tcp_offset:int,
tcp_ns:int,
tcp_cwr:int,
tcp_ece:int,
tcp_urg:int,
tcp_ack:int,
tcp_psh:int,
tcp_rst:int,
tcp_syn:int,
tcp_fin:int,
tcp_window:int,
tcp_len:int,
udp_sport:int,
udp_dport:int,
udp_len:int,
udp_checksum:chararray
);
This is a very rich data model and through leveraging the timestamp (ts), size of the IP packet (ip_total_length), and size of the TCP (tcp_len) and UDP (udp_len) we can calculate total and respective bandwidths at any interval. The beauty of pig is that I could easily hone in on specific hosts by grouping on the Source IP, Destination IP and Destination Port – but let’s keep things simple in this post.
The ip_proto field allows be to filter all packets based on protocol. TCP is IP protocol 6 and UDP is IP protocol 17.
tcp = FILTER packets BY ip_proto == 6;
udp = FILTER packets BY ip_proto == 17;
Once filtered we can bin each packet into a time period and then project a summary of the data with the size of all TCP packets in that time period (bin) summed.
tcp_grouped = GROUP tcp BY (ts / $time * $time);
tcp_summary = FOREACH tcp_grouped GENERATE group, SUM(tcp.tcp_len) AS tcp_len;
And then the same for UDP.
udp_grouped = GROUP udp BY (ts / $time * $time);
udp_summary = FOREACH udp_grouped GENERATE group, SUM(udp.udp_len) AS udp_len;
To get calculate total bandwidth of all IP packets we bin all packets using the same time period and then sum ip_total_length.
bw_grouped = GROUP packets BY (ts / $time * $time);
bw_summary = FOREACH bw_grouped GENERATE group, SUM(packets.ip_total_length) AS bw;
The output we were looking for is basically comma separated values for timestamp, tcp bandwidth, udp bandwidth and total bandwidth. This is produced by a final join and projection.
joined = JOIN tcp_summary BY group, udp_summary BY group, bw_summary BY group;
summary = FOREACH joined GENERATE tcp_summary::group, tcp_len, udp_len, bw;
It may seem a little cryptic but basically the JOIN statement is joining using the group that all the summaries share which is the time period. If you ILLUSTRATE the joined variable you will see the data is there but not in the format we are looking for.
| joined | tcp_summary::group:int | tcp_summary::tcp_len:long | udp_summary::group:int | udp_summary::udp_len:long | bw_summary::group:int | bw_summary::bw:long |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| | 1322644980 | 2080 | 1322644980 | 81 | 1322644980 | 2305 |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
However the summary projection generates the output the way we want it and we store that in a CSV format using PigStorage(‘,’).
STORE summary INTO '$output/binning' USING PigStorage(',');
Threat Detection
The SnortLoader() can be used to replay all conversations through Snort IDS and output attacks that it finds. The SnortLoader() can also take a snort.conf as a parameter so you can scan packet captures with specific Snort versions.
Run the basic snort.pig script to get an idea of the output.
./pigrun.py -x local -r data/web.pcap -f pig/examples/snort.pig
Now let’s run through the snort.pig script. Again we include all the jar’s we need for Packetpig.
%DEFAULT includepath pig/include.pig
RUN $includepath;
The script is constructed so that you can pass parameters to either scan all traffic for attacks or zero in on specific source and destination IP addresses. By leaving most of these null we inspect all traffic. Also note we are again binning time every 60 seconds. Lastly the Packetpig includes a number of versions of Snort. The default snort.conf we include ensures you use the latest one.
%DEFAULT time 60
%DEFAULT src null
%DEFAULT dst null
%DEFAULT sport null
%DEFAULT dport null
%DEFAULT snortconfig 'lib/snort/etc/snort.conf'
The SnortLoader() receives the snortconfig paramter and starts inspection the packet capture for attacks and provides them back to you in defined schema.
snort_alerts =
LOAD '$pcap'
USING com.packetloop.packetpig.loaders.pcap.detection.SnortLoader('$snortconfig')
AS (
ts:long,
sig:chararray,
priority:int,
message:chararray,
proto:chararray,
src:chararray,
sport:int,
dst:chararray,
dport:int
);
Using this schema you can access the timestamp (ts), Snort Signature ID (sig), Severity/Priority (priority), Description of the attack (message) and the Source (src), Source Port (sport), Destination (dst) and Destination port (dport) of the attack.
If you ran the script and opened up output/snort/part-m-00000 you will see a number of attacks matching the schema output of the SnortLoader(). One thing to note is Snort using Priority 1 for the highest severity, Priority 2 for the next highest etc.
1322645240 120_3 3 (http_inspect) NO CONTENT-LENGTH OR TRANSFER-ENCODING IN HTTP RESPONSE TCP 184.84.221.18 80 192.168.0.19 34299
1322645387 139_1 2 (spp_sdf) SDF Combination Alert DIVERT 184.84.221.18 0 192.168.0.19 0
1322645603 120_3 3 (http_inspect) NO CONTENT-LENGTH OR TRANSFER-ENCODING IN HTTP RESPONSE TCP 74.125.237.27 80 192.168.0.19 41791
1322645907 120_3 3 (http_inspect) NO CONTENT-LENGTH OR TRANSFER-ENCODING IN HTTP RESPONSE TCP 199.181.254.21 80 192.168.0.19 54222
1322645689 120_3 3 (http_inspect) NO CONTENT-LENGTH OR TRANSFER-ENCODING IN HTTP RESPONSE TCP 74.125.237.123 80 192.168.0.19 42514
1322645739 138_5 2 SENSITIVE-DATA Email Addresses TCP 74.125.237.123 80 192.168.0.19 42514
The snort.pig script is our most basic example but hopefully you are already thinking about what you could filter on (e.g. Severity) as well as re projecting the data you access out of SnortLoader() to find the top ten attackers and top ten victims.
In my next post I will show you how to find Zero Day attacks in past network packet captures.