You have been asked to examine a sample of network traffic to investigate suspicious activity on some of the company workstations. The company directors need to be able to understand this data. They have asked for further visualisation of the PCAP to help explain what is happening during the data period.
### Load in the libraries and the data
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
def load_csv_data():
data = pd.read_csv('./example_data/example_pcap.csv')
return data
data = load_csv_data()
data
No. | Time | Source | Destination | Protocol | Length | Info | |
---|---|---|---|---|---|---|---|
0 | 1 | 01:05:49.468757 | 172.16.1.4 | 172.16.1.255 | BROWSER | 243 | Host Announcement CARLFORCE-DC1, Workstation, ... |
1 | 2 | 01:05:50.279222 | 172.16.1.4 | 172.16.1.255 | BROWSER | 243 | Host Announcement CARLFORCE-DC1, Workstation, ... |
2 | 3 | 01:06:10.328524 | 172.16.1.201 | 224.0.0.252 | LLMNR | 66 | Standard query 0x229b A isatap |
3 | 4 | 01:06:10.390913 | 172.16.1.201 | 172.16.1.4 | DNS | 76 | Standard query 0x6ef6 A www.msftncsi.com |
4 | 5 | 01:06:10.391325 | 172.16.1.201 | 172.16.1.4 | DNS | 76 | Standard query 0x6ef6 A www.msftncsi.com |
... | ... | ... | ... | ... | ... | ... | ... |
8154 | 8155 | 01:43:36.828784 | 172.16.1.141 | 174.127.99.158 | TCP | 66 | [TCP Retransmission] 49211 > 2017 [SYN] Seq=... |
8155 | 8156 | 01:43:36.946258 | 174.127.99.158 | 172.16.1.141 | TCP | 54 | 2017 > 49211 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0 |
8156 | 8157 | 01:43:37.452810 | 172.16.1.141 | 174.127.99.158 | TCP | 62 | [TCP Retransmission] 49211 > 2017 [SYN] Seq=... |
8157 | 8158 | 01:43:37.563033 | 174.127.99.158 | 172.16.1.141 | TCP | 54 | 2017 > 49211 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0 |
8158 | 8159 | 01:43:38.578617 | 172.16.1.141 | 174.127.99.158 | TCP | 66 | 49212 > 2017 [SYN] Seq=0 Win=8192 Len=0 MSS=... |
8159 rows × 7 columns
# How may you create a network graph / node link graph using Python?
# This small code extract should help get you started.....
import numpy as np
srcs = data['Source']
dsts = data['Destination']
nodes = np.concatenate((srcs, dsts))
nodes = np.unique(nodes)
## Hint: NetworkX is a useful node-link library in Python - https://networkx.org/documentation/latest/tutorial.html
## You could try having each IP address as a node, and each edge defined by the source/destination pair
# How could you create a parallel coordinates chart using Python?
# This small code extract should help get you started.....
import pandas
import matplotlib.pyplot as plt
import seaborn as sns
protocol_list = ['DNS', 'HTTP', 'SMB2']
pc_data = data[['Source', 'Destination', 'Protocol', 'Length']]
pc_data = pc_data.astype(str)
pc_data = pc_data[pc_data['Protocol'].isin(protocol_list)]
# Hint: Pandas actually has a parallel coordinates capability built-in - https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.plotting.parallel_coordinates.html
# How would you take the subset of data above, plot this, and colour the lines based on the Protocol being used?