UFCFFY-15-M Cyber Security Analytics¶

Practical Lab 5: Yet More Network Traffic Analysis¶


You have been asked to examine a sample of network traffic to investigate suspicious activity on some of the company workstations. The company directors need to be able to understand this data. They have asked for further visualisation of the PCAP to help explain what is happening during the data period.

  • Can you analyse the packet capture (PCAP) file provided and produce useful visualisation outputs (e.g., node-link diagram, parallel coordinates) based on this, that can help further explain the observed activity?
In [44]:
### Load in the libraries and the data
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

def load_csv_data():
    data = pd.read_csv('./example_data/example_pcap.csv')
    return data

data = load_csv_data()
data
Out[44]:
No. Time Source Destination Protocol Length Info
0 1 01:05:49.468757 172.16.1.4 172.16.1.255 BROWSER 243 Host Announcement CARLFORCE-DC1, Workstation, ...
1 2 01:05:50.279222 172.16.1.4 172.16.1.255 BROWSER 243 Host Announcement CARLFORCE-DC1, Workstation, ...
2 3 01:06:10.328524 172.16.1.201 224.0.0.252 LLMNR 66 Standard query 0x229b A isatap
3 4 01:06:10.390913 172.16.1.201 172.16.1.4 DNS 76 Standard query 0x6ef6 A www.msftncsi.com
4 5 01:06:10.391325 172.16.1.201 172.16.1.4 DNS 76 Standard query 0x6ef6 A www.msftncsi.com
... ... ... ... ... ... ... ...
8154 8155 01:43:36.828784 172.16.1.141 174.127.99.158 TCP 66 [TCP Retransmission] 49211 > 2017 [SYN] Seq=...
8155 8156 01:43:36.946258 174.127.99.158 172.16.1.141 TCP 54 2017 > 49211 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
8156 8157 01:43:37.452810 172.16.1.141 174.127.99.158 TCP 62 [TCP Retransmission] 49211 > 2017 [SYN] Seq=...
8157 8158 01:43:37.563033 174.127.99.158 172.16.1.141 TCP 54 2017 > 49211 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0
8158 8159 01:43:38.578617 172.16.1.141 174.127.99.158 TCP 66 49212 > 2017 [SYN] Seq=0 Win=8192 Len=0 MSS=...

8159 rows × 7 columns

In [84]:
# How may you create a network graph / node link graph using Python?

# This small code extract should help get you started.....

import numpy as np

srcs = data['Source']
dsts = data['Destination']
nodes = np.concatenate((srcs, dsts))
nodes = np.unique(nodes)

## Hint: NetworkX is a useful node-link library in Python - https://networkx.org/documentation/latest/tutorial.html
## You could try having each IP address as a node, and each edge defined by the source/destination pair
In [85]:
# How could you create a parallel coordinates chart using Python?

# This small code extract should help get you started.....

import pandas
import matplotlib.pyplot as plt
import seaborn as sns

protocol_list = ['DNS', 'HTTP', 'SMB2']
 
pc_data = data[['Source', 'Destination', 'Protocol', 'Length']]
pc_data = pc_data.astype(str)
pc_data = pc_data[pc_data['Protocol'].isin(protocol_list)]
 
# Hint: Pandas actually has a parallel coordinates capability built-in - https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.plotting.parallel_coordinates.html
# How would you take the subset of data above, plot this, and colour the lines based on the Protocol being used?