publications | Prof. Phil Legg

2026

Heterogeneity-Aware Poisoning Attacks and Mitigation in Federated Learning: A Comprehensive Survey and Taxonomy

Aimen Djemaa, Djamel Djenouri, and Phil Legg

Electronics, 2026

Abs DOI HTML

Federated learning (FL) enables collaborative model training without sharing raw data, but remains vulnerable to poisoning attacks in which malicious participants manipulate local data, model updates, gradients, or learned behaviours to degrade performance or introduce targeted failures. These threats become harder to assess and mitigate in heterogeneous federated learning (HFL), where clients may differ in data distributions, model architectures, task objectives, resource availability, communication reliability, participation patterns, privacy constraints, and deployment environments. Existing surveys provide valuable coverage of FL security, poisoning attacks, robust aggregation, privacy-preserving mechanisms, and heterogeneity, but they do not sufficiently analyse how heterogeneity changes both poisoning behaviour and mitigation reliability. This survey addresses that gap by examining how statistical, model, task, device, communication, and participation heterogeneity affect poisoning feasibility, stealth, persistence, impact, transferability, attribution, and detectability. It then proposes a heterogeneity-aware taxonomy of poisoning mitigation mechanisms and compares existing strategies using operational criteria centred on attack–defence alignment, evidence validity, server visibility, privacy compatibility, scalability, deployment feasibility, and benign-client preservation. The central argument is that poisoning mitigation in HFL should not be evaluated only by attack type, defence family, clean accuracy, or attack success rate but also by whether defences observe and protect the channels through which heterogeneity-shaped attacks are expressed. The survey further identifies open challenges for developing channel-aware, privacy-compatible, scalable, adaptive, and false-positive-aware defences that preserve useful benign diversity under realistic HFL conditions.
Risk-based OPC UA Network Anomaly Detection for Interconnected Cyber-Physical Systems

Carol Lo, Thu Yein Win, Zeinab Rezaeifar, Zaheer Khan, and Phil Legg

In 1st International Conference on Synergies in Next-Generation Cyber-Physical Systems (SNGC-2026), September 16-18, Cardiff, UK, 2026

Abs HTML

Open Platform Communications Unified Architecture (OPC UA) improves interoperability in industrial Cyber-Physical Systems (CPS), but adversaries can abuse trusted OPC UA clients to perform unauthorised process manipulation. Detecting misuse of legitimate protocols remains challenging. This paper presents a risk-based unsupervised anomaly detection approach that prioritises high-consequence OPC UA write operations rather than modelling all OPC UA features uniformly. An Isolation Forest model is trained exclusively on normal OPC UA traffic generated by industrial software components within a simulation-based testbed. Three write-centric features are engineered to capture message size deviations, write frequency anomalies, and disproportionate write-to-traffic ratios. Experimental evaluation demonstrates reliable anomaly detection, achieving 93% accuracy and 96% F1-score. Inference results on a multi-stage attack chain spanning Purdue Levels 0 to 3 further show precise detection with minimal false alarms during OPC UA-enabled process manipulation stages. The findings highlight both the effectiveness and the inherent limitation of network-only detection, which provides strong confirmation of process manipulation but offers limited visibility into earlier intrusion stages targeting host systems. These results motivate the need for layered detection strategies for interconnected CPS.
DC-FL: A Server-Centric Distilled Clustered Federated Learning Framework for Heterogeneous Networks

Aimen Djemma, Djamel Djenouri, and Phil Legg

In 31st IEEE Symposium on Computers and Communications (ISCC) - DistInSys Workshop, June 23-26, Vilamoura, Portugal, 2026

Abs HTML

Most federated learning (FL) approaches assume homogeneous model architectures or address only statistical heterogeneity, which limits their applicability in heterogeneous environments such as Internet-of-Things (IoT) networks, where clients differ in both data distributions and model capacity. To address this challenge in wireless intrusion detection systems, we propose Distilled Clustered Federated Learning (DC-FL), a server-centric framework that groups clients by architecture and performs knowledge distillation (KD) strictly on the server after per-cluster aggregation. DC-FL uses only server-side proxy data with no client logits or features exchanged, which reduces client computation and potential privacy exposure. To further stabilise training under architectural diversity, an adaptive no-harm distillation mechanism is introduced. Dynamically selects the best-performing model as the teacher and commits the KD update only when the latter does not degrade cluster accuracy. Case-study experiments on two recent IoT IDS datasets (IDSIoT2024 and XCANIDS) show that DC-FL consistently outperforms FedAvg and FedMD. In data heterogeneous settings, DC-FL achieves up to 98.8% accuracy improvement over FedAvg (respectively FedMD) by approximately 14% (respectively 20%) at the convergence round. Moreover, it provides much larger early-round gains, and thus faster convergence, and narrows the gap between small and deep models, indicating improved cross-architecture transfer.
Security Analysis of Bluetooth Low Energy Medical Device Telemetry

Mia Smith, Thomas Draper, and Phil Legg

In IEEE Cyber Security and Resilience, August 3-5, Lisbon, Portugal, 2026

Abs HTML

Implantable and wearable medical devices increasingly rely on wireless communication to enable continuous monitoring and automated therapy delivery. In closed-loop medical systems, such as artificial pancreases, the integrity and confidentiality of transmitted data is crucial to patient safety. Bluetooth Low Energy (BLE) is widely used in these devices due to its low power consumption and suitability for battery-constrained environments. However, misconfigured BLE security settings may expose medical device telemetry to interception or manipulation. This work presents a preliminary security analysis of BLE communications in a prototype continuous glucose monitoring (CGM) system. Three BLE security configurations were evaluated, ultimately demonstrating that unencrypted BLE telemetry can be passively intercepted and replayed, while encrypted configurations prevent extraction of measurement data and reject replayed packets. These findings demonstrate the impact of BLE security configuration on telemetry protection in a prototype system, and we infer that similar risks may extend to closed-loop healthcare systems.
EAD-FL: Ensemble Anomaly Detection to Mitigate Poisoning Attacks in Federated Learning

Aimen Djemma, Djamel Djenouri, and Phil Legg

In IEEE Cyber Security and Resilience, August 3-5, Lisbon, Portugal, 2026

Abs HTML

This paper proposes EAD-FL, an Ensemble Anomaly Detection framework to mitigate untargeted poisoning attacks in Federated Learning (FL) for Intrusion Detection Systems (IDS) in Internet of Things (IoT) networks. Unlike existing anomaly-based defences that rely on a single detector, fixed contamination assumptions, or shared validation data, EAD-FL combines heterogeneous detectors through voting and adapts contamination dynamically based on detector disagreement. This design not only improves robustness and fairness but also preserves privacy, since no additional data needs to be exchanged to assess detector behaviour. Furthermore, EAD-FL is adaptable: as a pre-aggregation step it can be paired with any aggregation rule to enhance robustness without modifying the aggregation logic. Evaluation on an IoT intrusion detection dataset shows that EAD-FL achieves perfect client-level detection with an F1 score of 1.00 and a false positive rate (FPR) of 0.00, even when the assumed contamination level was lower than the true proportion of malicious clients. When combined with Median aggregation, EAD-FL reached 98% accuracy under 30% malicious participation, surpassing classical robust rules. These results confirm that EAD-FL offers stronger protection against poisoning while maintaining fairness to honest clients and high model utility.
Automatic bypassing of image-based puzzle CAPTCHA using edge detection and template matching

Michael Tchuindjang, Hamza Attak, Ian Johnson, and Phil Legg

In IEEE Cyber Security and Resilience, August 3-5, Lisbon, Portugal, 2026

Abs HTML

Completely Automated Public Turing tests to tell Computers and Humans Apart (CAPTCHAs) are widely used to protect online services from automated abuse. Puzzle-based CAPTCHAs improve resistance to automated attacks by requiring spatial reasoning tasks such as image reconstruction. cutCAPTCHA is a typical example of such schemes, requiring users to reposition three puzzle fragments to reconstruct the original image and gain access to a web service. However, its resilience under practical adversarial conditions remains underexplored. To the best of our knowledge, prior work report limited success rates for automated attacks on similar schemes, with no existing evaluation of cutCAPTCHA. This paper investigates the security of cutCAPTCHA under a realistic side-channel threat model and proposes a lightweight computer vision attack based on Canny edge detection and template matching from a computer vision library to solve the fragment alignment task without machine learning. Experimental evaluation on 260 real-world challenges shows a 53% attack success rate with an average execution time of 2.83 seconds per challenge. Further analysis reveals that structural factors, including fragment distinctiveness and background complexity, significantly impact performance. These results demonstrate that cutCAPTCHA is vulnerable to classical vision-based attacks and highlight key considerations for improving the resilience of puzzle-based CAPTCHA schemes.
Advancing fuzzing with unbiased random generator and Feistel network-based mutations

Sadegh Bamohabbat Chafjiri, Phil Legg, Jun Hong, and Michail-Antisthenis Tsompanas

Information and Software Technology, 2026

Abs DOI HTML

Context: This research tackles challenges in traditional fuzzing, such as limited coverage, instability, and inefficiency in bug discovery. We propose two novel models and their combination to enhance mutation processes and improve its reliability through unbiased randomisation, building on cryptographic techniques from our prior work. To our knowledge, we are the first to apply this approach to AFL++, extending Feistel-inspired mutation and high-performance randomisation to generate high-quality test cases, with potential to attract attention in the fuzzing community. Objectives: Integrate and assess Feistel-inspired mutations’ impact on AFL++ performance, focusing on code coverage and stability. Integrate the Permuted Congruential Generator (PCG) into AFL++ and evaluate its performance compared to traditional random number generators (RNGs). Evaluate a hybrid model combining Feistel and PCG randomness for better stability and coverage. Methods: We enhance AFL++ with algorithmic improvements and RNGs modifications. Our models include: CAFL++ (Cryptographic-AFL++): Integrates Feistel-inspired transformations for improved coverage. PCGAFL++: Refines the AFL’s RNG with PCG to reduce bias. CPCGAFL++: Combines Feistel-inspired swaps and PCG-based RNG for a robust fuzzing approach. Performance was analysed using metrics like Code Coverage and the Vargha-Delaney_A12 statistic across 20 Fuzzbench targets, and bug discovery on three targets. Results: Our models showed significant improvements over AFL++. CAFL++ outperformed AFL++ in 75% of test targets, offering better code coverage and stability. PCGAFL++ surpassed AFL++ in 60% of targets by enhancing randomness, resulting in more efficient fuzzing. CPCGAFL++ demonstrated improved stability and enhanced bug discovery performance, while achieving code coverage comparable to AFL++. These results highlight the key improvements introduced by our two models for fuzz testing. Conclusion: Our models advance fuzzing by improving code coverage and stability. Integrating Feistel-inspired swaps and PCG-based RNG overcomes traditional fuzzing limitations, offering a more efficient and reliable method. These models represent a step forward in fuzzing techniques, influencing both academic research and industrial practices.
Adversarially Resilient Federated Learning for Heterogeneous Edge Nodes in 5G Networks with Non-IID Data

Saniya Zafar, Phil Legg, Jonathan White, and Ahmad Salman

Internet of Things, 2026

Abs DOI HTML

The rapid deployment of 5G and Beyond-5G (B5G) edge networks introduces unique challenges for federated learning (FL) frameworks deployed at the edge, primarily due to heterogeneous non-IID data distributions and adversarial vulnerabilities. This paper proposes an adversarially robust federated learning (ARFL) mechanism that integrates hybrid feature selection and adversarial optimization to jointly enhance robustness against adversarial perturbations and improve computational efficiency under heterogeneous data distributions. The proposed methodology jointly optimizes classifier and adversary in a min–max formulation to enable robustness against perturbations of varying strengths. Experimental results on a real-world intrusion detection 5G-NIDD dataset demonstrates that standard FL suffers drastic deterioration under adversarial conditions, with accuracy, precision, recall, and F1-scores dropping to 20%–30% at ϵ=0.3. In contrast, the proposed ARFL framework consistently sustains performance above 92% across these metrics under all non-IID distributions, highlighting its robustness and reliability. Overall, ARFL achieves absolute adversarial accuracy improvements of 20%–70% points over standard FL while incurring only a marginal reduction in clean performance. Scalability experiments demonstrate the stability and efficiency of the ARFL framework, underscoring its suitability for real-world 5G edge deployments where robustness and efficiency are paramount.
LOTL-hunter: Detecting multi-stage living-off-the-land attacks in cyber-physical systems using decision fusion techniques with digital twins

Carol Lo, Thu Yein Win, Zeinab Rezaeifar, Zaheer Khan, and Phil Legg

Future Generation Computer Systems, 2026

Abs DOI HTML

The integration of smart sensors and actuators in industrial environments has expanded the cyber-physical attack surface, making it increasingly difficult to distinguish anomalies caused by cyberattacks from those due to mechanical or electrical faults. This challenge is exacerbated by stealthy, multi-stage attacks leveraging Living off the Land (LOTL) techniques, which often evade conventional anomaly detection or intrusion detection systems (IDS). This study presents a Digital Twin-based testbed for safe, repeatable simulation of multi-stage cyber-physical attacks targeting Cyber-Physical Systems (CPS) and Industrial Control Systems (ICS). We propose a two-level decision fusion method that aggregates and aligns anomalies across network, process, and host domains in synchronized 1-minute intervals. The first-level fusion improves OT-layer detection by applying confidence-aware decision logic to outputs combined from (a) a supervised deep learning model (LSTM-FCN) for process anomalies, (b) an unsupervised model (Isolation Forest) for OPC UA network anomalies, and (c) process alarm signals. The second-level fusion integrates these results with host-based anomalies, computed through point-based scoring of Wazuh alerts, to provide comprehensive IT/OT situational awareness. Experimental results demonstrate improved detection of stealthy, multi-stage APT attack behaviours. Additionally, Large Language Models (LLM) provide summarization of the integrated IT/OT anomaly logs into human-readable insights, enhancing interpretability and supporting cyber threat hunting.
Advances in Teaching and Learning for Cyber Security Education

Phil Legg, Natalie Coull, Charles Clarke, Harjinder Lallie, and Hany Atlam

2026

Abs DOI HTML

This book provides reflection and account of innovative practice for how we approach teaching and learning for cyber security education. Cyber security continues to play a crucial role in our modern society, with a need to secure supply chain attacks and critical infrastructure. How we therefore educate in this area and how we encourage new ways of thinking and new ways of addressing these challenges require a community effort across academia, industry, and government to continually reflect and enhance our practice. The variety of works in this book includes the development of cyber escape rooms, gamification approaches for cyber security education, and AI-based learning platforms. The topics span across how we rethink of the insider threat in the age of AI, how social media interplays with cyber security education, and how we teach geopolitical aspects of cyber security. As a rapidly growing area of education, there are many fascinating examples of innovative teaching and assessment taking place; however, as a community we can do more to share best practice and enhance collaboration across the education sector. CSE Connect is a UK-based community group that promotes sharing and collaboration in cyber security education so that we can upskill and innovate the community together both nationally and internationally. The chapters of this book were presented at the 5th Annual Advances in Teaching and Learning for Cyber Security Education conference, hosted by CSE Connect at the University of Warwick, UK, on July 22, 2025. The book is of interest to educators, students, and practitioners in cyber security, both for those looking to upskill in cyber security education, as well as those aspiring to work within the cyber security sector.

2025

An Explainable Ensemble-based Intrusion Detection System for Software-Defined Vehicle Ad-hoc Networks

Shakil Ibne Ahsan, Phil Legg, and S M Iftekharul Alam

Cyber Security and Applications, 2025

Abs DOI HTML

Intrusion Detection Systems (IDS) are widely employed to detect and mitigate external network security events. Vehicle ad-hoc Networks (VANETs) continue to evolve, especially with developments related to Connected Autonomous Vehicles (CAVs). In this study, we explore the detection of cyber threats in vehicle networks through ensemble-based machine learning, to strengthen the performance of the learnt model compared to relying on a single model. We propose a model that uses Random Forest and CatBoost as our main ’investigators’, with Logistic Regression used to then reason on their outputs to make a final decision. To further aid analysis, we use SHAP (SHapley Additive exPlanations) analysis to examine feature importance towards the final decision stage. We use the Vehicular Reference Misbehavior (VeReMi) dataset for our experimentation and observe that our approach improves classification accuracy, and results in fewer misclassifications compared to previous works. Overall, this layered approach to decision-making - combining teamwork among models with an explainable view of why they act as they do - can help to achieve a more reliable and easy-to-understand cyber security solution for smart transportation networks.
Leveraging activation and optimisation layers as dynamic strategies in the multi-task fuzzing scheme

Sadegh Bamohabbat Chafjiri, Phil Legg, Michail-Antisthenis Tsompanas, and Jun Hong

Computer Standards & Interfaces, 2025

Abs DOI HTML

Fuzzing is a common technique for identifying vulnerabilities in software. Recent approaches, like She et al.’s Multi-Task Fuzzing (MTFuzz), use neural networks to improve fuzzing efficiency. However, key elements like network architecture and hyperparameter tuning are still not well-explored. Factors like activation layers, optimisation function design, and vanishing gradient strategies can significantly impact fuzzing results by improving test case selection. This paper delves into these aspects to improve neural network-driven fuzz testing. We focus on three key neural network parameters to improve fuzz testing: the Leaky Rectified Linear Unit (LReLU) activation, Nesterov-accelerated Adaptive Moment Estimation (Nadam) optimisation, and sensitivity analysis. LReLU adds non-linearity, aiding feature extraction, while Nadam helps to improve weight updates by considering both current and future gradient directions. Sensitivity analysis optimises layer selection for gradient calculation, enhancing fuzzing efficiency. Based on these insights, we propose LMTFuzz, a novel fuzzing scheme optimised for these Machine Learning (ML) strategies. We explore the individual and combined effects of LReLU, Nadam, and sensitivity analysis, as well as their hybrid configurations, across six different software targets. Experimental results demonstrate that LReLU, individually or when paired with sensitivity analysis, significantly enhances fuzz testing performance. However, when combined with Nadam, LReLU shows improvement on some targets, though less pronounced than its combination with sensitivity analysis. This combination improves accuracy, reduces loss, and increases edge coverage, with improvements of up to 23.8%. Furthermore, it leads to a significant increase in unique bug detection, with some targets detecting up to 2.66 times more bugs than baseline methods.
TRIST: Towards a Container-Based ICS Testbed for Cyber Threat Simulation and Anomaly Detection

C. Lo, J. Christie, T.Y. Win, Z. Rezaeifar, Z. Khan, and P Legg

In Proceedings of the International Conference on Cybersecurity, Situational Awareness and Social Media (Cyber Science 2024), 2025

Abs DOI HTML

Cyber-attacks on Industrial Control Systems (ICS), as exemplified by the incidents at the Maroochy water treatment plant and the Ukraine’s electric power grid, have demonstrated that cyber threats can inflict significant physical impacts. These incidents caused widespread service disruptions and substantial economic losses, underscoring the urgent need for an in-depth understanding of cyber threats in industrial environments. Industrial security research is usually conducted on physical testbeds to avoid safety issues, production interruptions and other operational constraints in industrial processes. Nevertheless, security defenders often encounter obstacles in developing or accessing physical testbeds due to associated costs and complexities. These factors hinder research progress to devise early detection mechanisms for cyber threats—essential for effective incident response. To overcome these obstacles, this paper presents a container-based virtual testbed. Its lightweight architecture enables replicable and efficient deployment of testbeds at low cost for simulating cyber threats on Cyber-Physical Systems (CPS)—the cornerstone of industrial automation and control systems. Also, the container-based virtual testbed provides a cost-effective option for producing datasets for training, testing and optimization of unsupervised anomaly detection models. Besides, an evaluation on resource consumption is conducted. The paper also discusses the benefits and limitations of proposed container-based ICS testbeds and suggests future research areas.
Machine Learning and Data Analytics for Cyber Security

Phil Legg and Giorgio Giacinto

2025

Abs HTML

The aim of this reprint is to provide an overview of the current challenges within the cyber security community today, as recognized by our contributors. This reprint provides 16 papers from the Topical Collection on “Machine Learning and Data Analytics for Cyber Security” that cover topics of large language models for cybersecurity claim classification, adversarial machine learning attacks against intrusion detection systems, the detection of PLC process control anomalies, identifying session-replay bots compared to human users, and mitigating against side-channel attacks.
gh0stEdit: Exploiting Layer-Based Access Vulnerability Within Docker Container Images

Alan Mills, Jonathan White, and Phil Legg

arXiv, 2025

Abs HTML PDF

Containerisation is a popular deployment process for application-level virtualisation using a layer-based approach. Docker is a leading provider of containerisation, and through the Docker Hub, users can supply Docker images for sharing and repurposing popular software application containers. Using a combination of in-built inspection commands, publicly displayed image layer content, and static image scanning, Docker images are designed to ensure end users can clearly assess the content of the image before running them. In this paper we present gh0stEdit , a vulnerability that fundamentally undermines the integrity of Docker images and subverts the assumed trust and transparency they utilise. The use of gh0stEdit allows an attacker to maliciously edit Docker images, in a way that is not shown within the image history, hierarchy or commands. This attack can also be carried out against signed images (Docker Content Trust) without invalidating the image signature. We present two use case studies for this vulnerability, and showcase how gh0stEdit is able to poison an image in a way that is not picked up through static or dynamic scanning tools. Our attack case studies highlight the issues in the current approach to Docker image security and trust, and expose an attack method which could potentially be exploited in the wild without being detected. To the best of our knowledge we are the first to provide detailed discussion on the exploit of this vulnerability.
Examining the Role of Online Communities in the Prevention of Child Sexual Abuse

Nick Addis, Ella Rees, Ian Johnson, Phil Legg, and Maggie Brennan

2025

Abs HTML

2024 witnessed the highest levels of online child sexual abuse material ever observed since records began, with the Internet Watch Foundation (IWF) identifying 291,273 webpages containing child sexual abuse imagery, links to imagery, or the advertising of this (IWF, 2025), facilitated through online communities accessible via the surface web (i.e. social media platforms, and online gaming communities). The sheer scale and harms caused by these crimes have led to calls for this to be viewed as a global public health issue, in need of an immediate and coordinated response. However, research into the nature of such online communities is largely limited in scope, with focus predominantly targeting potential perpetrators and their access to online communities (rather than the safeguarding activities of specific online platforms). This chapter seeks to bridge this gap, presenting work exploring the role played by online communities (i.e. through social media and gaming platforms) in preventing online child sexual abuse. Specifically, the chapter examines current practices utilised by such platforms (as well as challenges faced) in the fight against online child sexual abuse. This includes the impact of the ever-growing use of generative AI, as well as the need to hold technology companies to account for content hosted on their platforms, particularly considering the recently implemented Online Safety Act 2023.
Jailbreaking LLMs Through Tense Manipulation in Multi-Turn Dialogues

Michael Tchuindjang, Nathan Duran, Phil Legg, and Faiza Medjek

In Advances in Computational Intelligence Systems: Contributions Presented at the 24th UK Workshop on Computational Intelligence, September 3-5, 2025, Edinburgh, UK, 2025

Abs HTML

Large Language Models (LLMs) have demonstrated great potential across many domains, however their susceptibility to jailbreak attacks presents opportunities for malicious actors. These attacks manipulate LLMs to divulge sensitive information or generate harmful content, that could further be utilised in nefarious ways, including as part of cyber-enabled crime, hence presenting a crucial cyber security challenge for the safe and secure interaction of AI-enabled systems. Previous studies highlight the security challenges posed by multi-turn interactions, in which attackers strategically conceal their malicious intent through extended conversations. However, the influence of tense variations on the efficacy of such attacks has received limited attention. This study addresses this gap by systematically examining the role of tense manipulation in multi-turn jailbreak attacks on LLMs. We introduce a novel multi-turn jailbreak attack that specifically exploits past tense reformulation, along with a multi-turn dialogue dataset designed for cyber-related attacks. Experiments conducted on both open-source LLMs (Llama 2-7B, Qwen 2-7B) and closed-source LLMs (GPT-4o-mini, Gemini 2-flash) demonstrate that past tense reformulation significantly enhances attack performance, yielding an average increase of 25.30% with larger effect on closed-source models. These findings highlight the urgent need to strengthen LLM defence strategies against tense variations in multi-turn dialogues. The dataset and jailbreak artefacts are available at: https://github.com/Micdejc/llm_multiturn_attacks
Federated Learning with Adversarial Optimisation for Secure and Efficient 5G Edge Computing Networks

Saniya Zafar, Jonathan White, and Phil Legg

Big Data and Cognitive Computing, 2025

Abs DOI HTML

With the evolution of 5G edge computing networks, privacy-aware applications are gaining significant attention due to their decentralised processing capabilities. However, these networks face substantial challenges to ensure privacy and security, specifically in a Federated Learning (FL) setup, where adversarial attacks can potentially influence the model integrity. Conventional privacy-preserving FL mechanisms are often susceptible to such attacks, leading to degraded model performance and severe security vulnerabilities. To address this issue, we propose FL with adversarial optimisation framework to improve adversarial robustness in 5G edge computing networks while ensuring privacy preservation. The proposed framework considers two models; a classifier model and an adversary model, where the classifier model is integrated with the adversary model, trained jointly considering Fast Gradient Sign Method (FGSM) for generation of adversarial perturbations. This adversarial optimisation enhances classifier’s resilience to attacks, thereby improving both privacy preservation and model accuracy. Experimental analysis reveals that the proposed model achieves up to 99.44% accuracy on adversarial test data, while improving robustness and sustaining high precision and recall across varying client scenarios. The experimental results further ensure the effectiveness of the proposed model in terms of communication efficiency and computational efficiency while reducing inference time and FLOPs making it ideal for secure 5G edge computing applications.

2024

Cyber Security Unplugged: Teaching Security Management Through Immersive Gameplay

Alan Mills, Ian Caple, Phil Legg, Sophie Fenn, and Aida Abzhaparova

In Advances in Teaching and Learning for Cyber Security Education, 2024

Abs HTML

Cyber security education and the online safety of young people is key on the government agenda. A variety of initiatives are designed to raise awareness about the threats and safeguards related to children and their use of technology. Furthermore, cyber security incidents are frequently impacting young people by directly disrupting the services they rely on, from schools to social media. As a result of a vast array of national, regional and local government funded programmes, young people are becoming increasingly aware of the importance of cyber security, both as an a societal need but also as a potential career path. However, what remains is still a gap between the need for cyber education and the diverse range of careers in this field. This chapter presents an innovative pedagogic and gamified experience for teaching fundamental security management concepts, that differs from typical learning environments that focus on technical skills. The game was co-developed with school teachers and designed for teachers and educators to use in a variety of settings, such as schools, HE, and business. This game has been successful in demystifying the field of cyber security for new players, explaining the entry points into the field, and demonstrating the wide-range of skills and behaviours required to pursue careers in cyber security. By drawing from the successful application of this game, the chapter highlights the importance of using immersive, offline, and non-technical (“unplugged”) pedagogies. Furthermore, it demonstrates how educators can explain the complexities of cyber security and entice young people to further explore a multitude of roles and careers in this field.
Advances in Teaching and Learning for Cyber Security Education

Phil Legg, Natalie Coull, and Charles Clarke

2024

Abs DOI HTML

This book showcases latest trends and innovations for how we teach and approach cyber security education. Cyber security underpins the technological advances of the 21st century and is a fundamental requirement in today’s society. Therefore, how we teach and educate on topics of cyber security and how we overcome challenges in this space require a collective effort between academia, industry and government. The variety of works in this book include AI and LLMs for cyber security, digital forensics and how teaching cases can be generated at scale, events and initiatives to inspire the younger generations to pursue cyber pathways, assessment methods that provoke and develop adversarial cyber security mindsets and innovative approaches for teaching cyber management concepts. As a rapidly growing area of education, there are many fascinating examples of innovative teaching and assessment taking place; however, as a community we can do more to share best practice and enhance collaboration across the education sector. CSE Connect is a community group that aims to promote sharing and collaboration in cyber security education so that we can upskill and innovate the community together. The chapters of this book were presented at the 4th Annual Advances in Teaching and Learning for Cyber Security Education conference, hosted by CSE Connect at the University of the West of England, Bristol, the UK, on July 2, 2024. The book is of interest to educators, students and practitioners in cyber security, both for those looking to upskill in cyber security education, as well as those aspiring to work within the cyber security sector.
Privacy-Based Triage of Suspicious Activity Reports Using Offline Large Language Models

Phil Legg, Nicholas Ryder, Samantha Bourton, Diana Johnson, and Reuben Walker

In Advancements in Cyber Crime Investigations and Modern Data Analytics, 2024

Abs HTML

Suspicious Activity Reports (SAR) form a vital part of incident response and case management for the investigation of known or suspected money laundering. However, those submitting SARs, and those tasked with analysing SARs, often find the task overwhelming due to the complexity of reporting, the incompleteness of information available, and the ability to classify reports effectively for further processing. We explore the use of Natural Language Processing to facilitate this process. Specifically, we utilise the recent advances of Large Language Models to understand and classify SARs against the glossary code terms set out by the UK National Crime Agency. We also explore the privacy concerns of handling confidential and sensitive data with recent AI advancements and propose the use of offline open-source models, coupled with bespoke fine-tuning, to improve task-specific performance using a model that can be deployed locally without requiring data to be shared with external third parties. Our results show that this approach can yield effective classification accuracy on our test cases, offering a solution to develop bespoke smaller, offline models that maintain privacy and confidentiality, over online models that would compromise data privacy.
Privacy-preserving intrusion detection in software-defined VANET using federated learning with BERT

Shakil Ibne Ahsan, Phil Legg, and SM Alam

arXiv preprint arXiv:2401.07343, 2024

Abs HTML

The absence of robust security protocols renders the VANET (Vehicle ad-hoc Networks) network open to cyber threats by compromising passengers and road safety. Intrusion Detection Systems (IDS) are widely employed to detect network security threats. With vehicles’ high mobility on the road and diverse environments, VANETs devise ever-changing network topologies, lack privacy and security, and have limited bandwidth efficiency. The absence of privacy precautions, End-to-End Encryption methods, and Local Data Processing systems in VANET also present many privacy and security difficulties. So, assessing whether a novel real-time processing IDS approach can be utilized for this emerging technology is crucial. The present study introduces a novel approach for intrusion detection using Federated Learning (FL) capabilities in conjunction with the BERT model for sequence classification (FL-BERT). The significance of data privacy is duly recognized. According to FL methodology, each client has its own local model and dataset. They train their models locally and then send the model’s weights to the server. After aggregation, the server aggregates the weights from all clients to update a global model. After aggregation, the global model’s weights are shared with the clients. This practice guarantees the secure storage of sensitive raw data on individual clients’ devices, effectively protecting privacy. After conducting the federated learning procedure, we assessed our models’ performance using a separate test dataset. The FL-BERT technique has yielded promising results, opening avenues for further investigation in this particular area of research. We reached the result of our approaches by comparing existing research works and found that FL-BERT is more effective for privacy and security concerns. Our results suggest that FL-BERT is a promising technique for enhancing attack detection.
Cyber Funfair: Creating Immersive and Educational Experiences for Teaching Cyber Physical Systems Security

Alan Mills, Jonathan White, and Phil Legg

In SIGCSE 2024: Proceedings of the 55th ACM Technical Symposium on Computer Science Education, 2024

Abs HTML

Delivering meaningful and inspiring cyber security education for younger audiences can often be a challenge due to limited expertise and resources. Key to any outreach activity is that it both develops a learner’s curiosity, as well as providing educational objectives. To address this need, we developed a novel learning and awareness activity that addresses the Cyber Physical Systems (CPS) Security knowledge area as mapped by the Cyber Security Body of Knowledge (CyBOK). At the core of our activity is the integration of the Raspberry Pi device with LEGO SPIKE kits. LEGO SPIKE is part of the LEGO Education system that combines colourful LEGO building blocks with motors and sensors, creating an adaptable and engaging learning environment. This hands-on approach allows participants to witness the tangible consequences of cyber and network actions in a physical and engaging format. To evaluate the effectiveness of the activity, we used the activity as part of an outreach activity day attended by approximately 300 students aged between 12-14 from schools across the West of England. Participants of the activity were surveyed and the results showed an increase in understanding of CPS specific and wider cyber security for over 90% of respondents. Activity engagement was also well received with no negative feedback. We report on our survey findings and discuss best practices to support other practitioners in developing hands-on interactive experiences for engaging and educational cyber security activities.
Vulnerability detection through machine learning-based fuzzing: A systematic review

Sadegh Bamohabbat Chafjiri, Phil Legg, Jun Hong, and Michail-Antisthenis Tsompanas

Computers & Security, 2024

Abs HTML

Modern software and networks underpin our digital society, yet the rapid growth of vulnerabilities that are uncovered within these threaten our cyber security posture. Addressing these issues at scale requires automated proactive approaches that can identify and mitigate these vulnerabilities in a suitable time frame. Fuzzing techniques have emerged as crucial methods to preemptively tackle these risks. However, traditional fuzzing methods encounter various challenges, such as a lack of strategy for deep bug identification, time-intensive bug analysis, quality of inputs, seed scheduling and others. To overcome these challenges, diverse Machine Learning (ML) models and optimisation techniques have been employed, including advanced feature engineering, optimised seed selection, refined predictive/fitness models, and Gradient-based optimisation. Furthermore, the use of ML architectures such as Long Short-Term Memory (LSTM), Generative Adversarial Network (GAN), Sequence-to-Sequence (Seq2Seq), and Generative Randomised Unit (GRU), have demonstrated greater effectiveness within ML-based fuzzing. In this paper, we delve into this paradigm shift, aiming to address fundamental challenges across different ML categories. We survey popular ML categories such as Traditional Machine Learning (TML), Deep Learning (DL), Reinforcement Learning (RL), and Deep Reinforcement Learning (DRL), to investigate their potential for enhancing traditional fuzzing approaches. We explore the respective advantages in each category of ML-based fuzzing, while also analysing the challenges unique to each category. Our work provides a comprehensive survey across the fuzzing domain and how machine learning techniques have been utilised, that we believe will be of use to future researchers in this domain.
Improving Search Space Analysis of Fuzzing Mutators Using Cryptographic Structures

Sadegh Bamohabbat Chafjiri, Phil Legg, Michail-Antisthenis Tsompanas, and Jun Hong

In Proceedings of Ninth International Conference on Cyber Security, Privacy in Communication Networks (ICCS 2023), 2024

Abs HTML

This paper introduces a novel approach to enhance the performance of software fuzzing mutator tools, by leveraging cryptographic structures known as substitution-permutation networks and Feistel networks. By integrating these structures into the existing HonggFuzz fuzzing library, we propose HonggFuzz+ and demonstrate its effectiveness over other leading fuzzers, such as how the method can uncover bugs and edges earlier due to enhanced search space optimisation. By introducing these two structures, we can diversify memory region relationships that can ultimately improve the performance of HonggFuzz. We demonstrate our approach on a range of common software examples from previous software fuzzing literature. Our results show better or as good performance across a range of software targets when compared to other leading fuzzing techniques. We discuss the relevance of the findings and consider future directions for improving software fuzzing search space analysis.
GoibhniUWE: A Lightweight and Modular Container-Based Cyber Range

Alan Mills, Jonathan White, and Phil Legg

Journal of Cybersecurity and Privacy, 2024

Abs HTML PDF

Cyberattacks are rapidly evolving both in terms of techniques and frequency, from low-level attacks through to sophisticated Advanced Persistent Threats (APTs). There is a need to consider how testbed environments such as cyber ranges can be readily deployed to improve the examination of attack characteristics, as well as the assessment of defences. Whilst cyber ranges are not new, they can often be computationally expensive, require an extensive setup and configuration, or may not provide full support for areas such as logging or ongoing learning. In this paper, we propose GoibhniUWE, a container-based cyber range that provides a flexible platform for investigating the full lifecycle of a cyberattack. Adopting a modular approach, users can seamlessly switch out existing, containerised vulnerable services and deploying multiple different services at once, allowing for the creation of complex and realistic deployments. The range is fully instrumented with logging capabilities from a variety of sources including Intrusion Detection Systems (IDSs), service logging, and network traffic captures. To demonstrate the effectiveness of our approach, we deploy the GoibhniUWE range under multiple conditions to simulate various vulnerable environments, reporting on and comparing key metrics such as CPU and memory usage. We simulate complex attacks which span multiple services and networks, with logging at multiple levels, modelling an Advanced Persistent Threat (APT) and their associated Tactics, Techniques, and Procedures (TTPs). We find that even under continuous, active, and targeted deployment, GoibhniUWE averaged a CPU usage of less than 50%, in an environment using four single-core processors, and memory usage of less than 4.5 GB.
Digital twins of cyber physical systems in smart manufacturing for threat simulation and detection with deep learning for time series classification

Carol Lo, Thu Yein Win, Zeinab Rezaeifar, Zaheer Khan, and Phil Legg

In 29th International Conference on Automation and Computing (ICAC2024), University of Sunderland, UK, 2024

Abs HTML

With increasing reliance on Cyber Physical Systems (CPS) for automation and control in Industry 4.0 and 5.0, ensuring their security against cyber threats has become paramount. Traditional security mechanisms, constrained by operational continuity and safety requirements, offer limited proactive threat detection capabilities against sophisticated Advanced Persistent Threats (APT). This research introduces the use of a Digital Twin testbed for repeatable simulation of diverse threat scenarios, generation of rich and varied datasets that depict a cyber incident, along with the ability to train time-series classification models for attack recognition. Our research aims to overcome the limitations of physical testbeds and challenges of data scarcity for Machine Learning (ML) or Deep Learning (DL) model development. By leveraging Digital Twins for data-driven analysis, this study proposes the use of supervised DL for accurate threat detection and classification in CPS within smart manufacturing. This paper demonstrates that Digital Twins testbed provides a cost-effective option for generating datasets to train and test supervised deep learning-based time series classification model for threat detection in CPS. It also discusses the benefits and limitations of the proposed testbed and suggests future research areas.
Analyst-driven XAI for time series forecasting: Analytics for telecoms maintenance

James Barrett, Phil Legg, Jim Smith, and Chip Boyle

In ACM 9th International Conference on Machine Learning Technologies (ICMLT) 2024, 2024

Abs HTML

Time series forecasting facilitates real-time anomaly detection in telecom networks, predicting events that disrupt security and service. Current research efforts have been found to focus on new forecasting libraries, more rigorous data cleaning methods, and model hyperparameter tuning, although we believe human in-the-loop system approaches are not well applied to the domain of time series forecasting. We explore the usage of a model investigation tool to enable an interactive machine learning process that allows the interrogation of modern forecasting models to enable effective model management techniques. This research aims to demonstrate the usage of an interactive forecasting ensemble tool that enables a user to interrogate time series data, uncover insights in data predictions to make choices, and adjust a model accordingly. Through comparative testing and an analysis of existing model management strategies, we propose that enabling greater levels human-machine teaming via our tool promotes the ability to “catch” mistakes and oversights based on the assumptions of existing time series forecasting methods.
Evaluating data distribution strategies in federated learning: A trade-off analysis between privacy and performance for IoT security

Jonathan White and Phil Legg

In Proceedings of Ninth International Conference on Cyber Security, Privacy in Communication Networks (ICCS 2023), 2024

Abs HTML

Federated learning is an effective approach for training a global machine learning model. It uses locally acquired data without having to share local data with the centralised server. This method provides a machine learning model beneficial for all parties. It ensures that individual parties do not compromise their privacy or disclose sensitive or personal data. From a cyber security perspective, machine learning with federated learning can highlight intrusions or anomalous activity on a device, without the individual device owner having to reveal characteristics of their own personal usage that would then breach their own privacy. In this paper, we conduct an exploratory investigation into two public datasets, Edge-IIoTset, and CICIoT2023, and we highlight the strengths and limitations of these datasets as currently presented. We then conduct further experimentation on the CICIoT2023 dataset, that previously has only been used for developing centralised learning models. We investigate machine learning performance under various distributions of the data across a set of federated clients, including stratified, leave-one-out, one-class, and half-benign strategies. Specifically, we examine whether a comparable model can be developed using federated learning, and how little data is required by each client to maintain privacy whilst also offering comparable performance against a centralised model.
Trends and Challenges in Data Analytics and Machine Learning Using Call Detail Records in Telecom Systems

James Barrett, Phil Legg, Jim Smith, Tim Barnes, and Charles Boyle

Preprints, 2024

Abs HTML PDF

Federated learning is an effective approach for training a global machine learning model. It uses locally acquired data without having to share local data with the centralised server. This method provides a machine learning model beneficial for all parties. It ensures that individual parties do not compromise their privacy or disclose sensitive or personal data. From a cyber security perspective, machine learning with federated learning can highlight intrusions or anomalous activity on a device, without the individual device owner having to reveal characteristics of their own personal usage that would then breach their own privacy. In this paper, we conduct an exploratory investigation into two public datasets, Edge-IIoTset, and CICIoT2023, and we highlight the strengths and limitations of these datasets as currently presented. We then conduct further experimentation on the CICIoT2023 dataset, that previously has only been used for developing centralised learning models. We investigate machine learning performance under various distributions of the data across a set of federated clients, including stratified, leave-one-out, one-class, and half-benign strategies. Specifically, we examine whether a comparable model can be developed using federated learning, and how little data is required by each client to maintain privacy whilst also offering comparable performance against a centralised model.

2023

Improving Search Space Analysis of Fuzzing Mutators Using Cryptographic Structures

Sadegh Bamohabbat Chafjiri, Phil Legg, Michail-Antisthenis Tsompanas, and Jun Hong

In International Conference on Cyber Security, Privacy in Communication Networks, 2023

Abs DOI HTML

This paper introduces a novel approach to enhance the performance of software fuzzing mutator tools, by leveraging cryptographic structures known as substitution-permutation networks and Feistel networks. By integrating these structures into the existing HonggFuzz fuzzing library, we propose HonggFuzz+ and demonstrate its effectiveness over other leading fuzzers, such as how the method can uncover bugs and edges earlier due to enhanced search space optimisation. By introducing these two structures, we can diversify memory region relationships that can ultimately improve the performance of HonggFuzz. We demonstrate our approach on a range of common software examples from previous software fuzzing literature. Our results show better or as good performance across a range of software targets when compared to other leading fuzzing techniques. We discuss the relevance of the findings and consider future directions for improving software fuzzing search space analysis.
Federated Learning: Data Privacy and Cyber Security in Edge-Based Machine Learning

Jonathan White and Phil Legg

In Data Protection in a Post-Pandemic Society, 2023

Abs HTML

Machine learning is now a key component of many applications for understanding trends and characteristics within the wealth of data that may be processed, whether this be learning about customer preferences and travel preferences, forecasting future behaviour of stock markets, weather, or crime rates, classifying and recognising images and text content, or a whole host of other technologies that are becoming integrated as part of our daily lives. The raft of applications is broad and continues to grow daily. At the same time, there are growing concerns about the data protection, security and data privacy of such applications, as smart devices are embedded deeper in our daily activity. How can we ensure that this data that is gathered and utilised about our daily interactions can be best protected, in terms of ensuring systems are truly secure and that user’s privacy is maintained and assured? In this chapter, we explore the recent developments of federated learning, introduced by Google in 2016. This approach mandates that data remains at the place where it was collected, and that is it only data models that pass over the network. In this way, there is no centralised data storage, and no personal data leaves the point where it was generated. We present the recent works of this growing area of research, and we posit the challenges posed from both the data privacy and cyber security standpoints. We show how federated learning can be applied to a cyber security case study of distributed monitoring for Intrusion Detection. We also consider the wider implications of data privacy in machine learning and federated learning systems.
Longitudinal Risk-based Security Assessment of Docker Software Container Images

Alan Mills, Jonathan White, and Phil Legg

Computers and Security, 2023

Abs HTML

As the use of software containerisation has increased, so too has the need for security research on their usage, with various surveys and studies conducted to assess the overall security posture of software container images. To date, there has been very little work that has taken a longitudinal view of container security to observe whether vulnerabilities are being resolved over time, as well as understanding the real-world implications of reported vulnerabilities, to assess the evolving security posture. In this work, we study the evolution of 380 software container images across 3 analysis periods between July 2022 and January 2023 to analyse maintenance and vulnerabilities factors over time. We sample across the 3 DockerHub categories: Official, Verified and OSS (Sponsored) Open Source Software. We found that the number of vulnerabilities present increased over time despite many containers receiving regular updates by providers. We also found that the choice of container OS can dramatically impact the number of reported vulnerabilities present over time, with Debian-based images typically having many more vulnerabilities that other Linux distributions, and with some containers still reporting vulnerabilities that date back as far as 1999. However, when taking into account additional reported attributes such as the attack vector required and the existence of a public exploit rated higher than negligible, we found that for each analysis period, less than 1% of all vulnerabilities present what we would consider as high risk real-world impact. Through our investigation, we aim to improve the understanding of the threat landscape posed by software containerisation that is further complicated by the discrepancies between different vulnerability reporting tools.

2022

Investigating Malware Propagation and Behaviour Using System and Network Pixel-Based Visualisation

Jacob Williams and Phil Legg

SN Computer Science, 2022

Abs HTML

Malicious software, known as malware, is a perpetual game of cat and mouse between malicious software developers and security professionals. Recent years have seen many high profile cyber attacks, including the WannaCry and NotPetya ransomware attacks that resulted in major financial damages to many businesses and institutions. Understanding the characteristics of such malware, including how malware can propagate and interact between systems and networks is key for mitigating these threats and containing the infection to avoid further damage. In this study, we present visualisation techniques for understanding the propagation characteristics in dynamic malware analysis. We propose the use of pixel-based visualisations to convey large-scale complex information about network hosts in a scalable and informative manner. We demonstrate our approach using a virtualised network environment, whereby we can deploy malware variants and observe their propagation behaviours. As a novel form of visualising system and network activity data across a complex environment, we can begin to understand visual signatures that can help analysts identify key characteristics of the malicious behaviours, and, therefore, provoke response and mitigation against such attacks.
Functionality-preserving adversarial machine learning for robust classification in cybersecurity and intrusion detection domains: A survey

Andrew McCarthy, Essam Ghadafi, Panagiotis Andriotis, and Phil Legg

Journal of Cybersecurity and Privacy, 2022

Abs HTML PDF

Machine learning has become widely adopted as a strategy for dealing with a variety of cybersecurity issues, ranging from insider threat detection to intrusion and malware detection. However, by their very nature, machine learning systems can introduce vulnerabilities to a security defence whereby a learnt model is unaware of so-called adversarial examples that may intentionally result in mis-classification and therefore bypass a system. Adversarial machine learning has been a research topic for over a decade and is now an accepted but open problem. Much of the early research on adversarial examples has addressed issues related to computer vision, yet as machine learning continues to be adopted in other domains, then likewise it is important to assess the potential vulnerabilities that may occur. A key part of transferring to new domains relates to functionality-preservation, such that any crafted attack can still execute the original intended functionality when inspected by a human and/or a machine. In this literature survey, our main objective is to address the domain of adversarial machine learning attacks and examine the robustness of machine learning models in the cybersecurity and intrusion detection domains. We identify the key trends in current work observed in the literature, and explore how these relate to the research challenges that remain open for future works. Inclusion criteria were: articles related to functionality-preservation in adversarial machine learning for cybersecurity or intrusion detection with insight into robust classification. Generally, we excluded works that are not yet peer-reviewed; however, we included some significant papers that make a clear contribution to the domain. There is a risk of subjective bias in the selection of non-peer reviewed articles; however, this was mitigated by co-author review. We selected the following databases with a sizeable computer science element to search and retrieve literature: IEEE Xplore, ACM Digital Library, ScienceDirect, Scopus, SpringerLink, and Google Scholar. The literature search was conducted up to January 2022. We have striven to ensure a comprehensive coverage of the domain to the best of our knowledge. We have performed systematic searches of the literature, noting our search terms and results, and following up on all materials that appear relevant and fit within the topic domains of this review. This research was funded by the Partnership PhD scheme at the University of the West of England in collaboration with Techmodal Ltd.
OGMA: visualisation for software container security analysis and automated remediation

Alan Mills, Jonathan White, and Phil Legg

In 2022 IEEE International Conference on Cyber Security and Resilience (CSR), 2022

Abs HTML

The use of software containerisation has rapidly increased in academia and industry which has lead to the production of several container security scanning tools for assessing the security posture and threat of a container image. The variability between tools often differ on the coverage of vulnerabilities, their assessed severity and their output formats. It is also common to find duplicate Common Vulnerabilities and Exposures (CVEs) in their reporting which can often skew the risk assessment of a container. These issues along with the lack of automated solutions for maintaining up-to-date patching of container images are currently open issues identified by the research community that we address in this paper. We present OGMA, a visualisation tool for improved analysis and assessment of container security issues across multiple, often conflicting, scanning tools. In addition to severity, our approach helps to examine attack vector and exploit availability, while also removing duplicated CVEs, therefore providing a clearer picture for risk analysts to understand the threat posed by container deployment. Furthermore, we couple this with a novel remediation scheme for updating vulnerable containers whilst ensuring that functionality is preserved, and show how our visualisation system can highlight the improved security posture of the fixed container. Our results highlight the existing security issues in pre-built container images and the inconsistencies between scanning tools, whilst our proposed approach helps to identify and mitigate such threats to improve container security as part of the wider challenges of software supply chain security.
Teaching Offensive and Defensive Cyber Security in Schools using a Raspberry Pi Cyber Range

Phil Legg, Alan Mills, and Ian Johnson

In Colloquium on Information Systems Security Education (CISSE), 2022

Abs HTML

Computer Science as a subject is now appearing in more school curricula for GCSE and A level, with a growing demand for cyber security to be embedded within this teaching. Yet, teachers face challenges with limited time and resource for preparing practical materials to effectively convey the subject matter. We hosted a series of workshops designed to understand the challenges that teachers face in delivering cyber security education. We then worked with teachers to co-create practical learning resources that could be further developed as tailored lesson plans, as required for their students. In this paper, we report on the challenges highlighted by teachers, and we present a portable and isolated infrastructure for teaching the basics of offensive and defensive cyber security, as a co-created activity based on the teacher workshops. Whilst we present an example case study for red and blue team student engagement, we also reflect on the wide scope of topics and tools that students would be exposed to through this activity, and how this platform could then be generalised for further cyber security teaching.
Defending against adversarial machine learning attacks using hierarchical learning: A case study on network traffic attack classification

Andrew McCarthy, Essam Ghadafi, Panagiotis Andriotis, and Phil Legg

Journal of Information Security and Applications, 2022

Abs HTML

Machine learning is key for automated detection of malicious network activity to ensure that computer networks and organizations are protected against cyber security attacks. Recently, there has been growing interest in the domain of adversarial machine learning, which explores how a machine learning model can be compromised by an adversary, resulting in misclassified output. Whilst to date, most focus has been given to visual domains, the challenge is present in all applications of machine learning where a malicious attacker would want to cause unintended functionality, including cyber security and network traffic analysis. We first present a study on conducting adversarial attacks against a well-trained network traffic classification model. We show how well-crafted adversarial examples can be constructed so that known attack types are misclassified by the model as benign activity. To combat this, we present a novel defensive strategy based on hierarchical learning to help reduce the attack surface that an adversarial example can exploit within the constraints of the parameter space of the intended attack. Our results show that our defensive learning model can withstand crafted adversarial attacks and can achieve classification accuracy in line with our original model when not under attack.
Interactive cyber-physical system hacking: Engaging students early using Scalextric

Jonathan White, Phil Legg, and Alan Mills

In Colloquium on Information Systems Security Education (CISSE), 2022

Abs HTML

Cyber Security as an education discipline covers a variety of topics that can be challenging and complex for students who are new to the subject domain. With this in mind, it is crucial that new students are motivated by understanding both the technical aspects of computing and networking, and the real-world implications of compromising these systems. In this paper we approach this task to create an engaging outreach experience, on the concept of cyber-physical systems, using a Scalextric slot-car racetrack. In the activity, students seek to compromise the underlying computer system that is linked to the track and updates the scoreboard system, in order to inflate their own score and to sabotage their opponent. Our investigation with this technique shows high levels of engagement whilst providing an excellent platform for teaching basic concepts of enumeration, brute forcing, and privilege escalation. It also provokes discussion on how this activity relates to real-world cases of cyber-physical systems security in the sports domain and beyond.

2021

Deep Learning-Based Security Behaviour Analysis in IoT Environments: A Survey

Yawei Yue, Shancang Li, Phil Legg, and Fuzhong Li

Security and communication Networks, 2021

Abs HTML

Internet of Things (IoT) applications have been used in a wide variety of domains ranging from smart home, healthcare, smart energy, and Industrial 4.0. While IoT brings a number of benefits including convenience and efficiency, it also introduces a number of emerging threats. The number of IoT devices that may be connected, along with the ad hoc nature of such systems, often exacerbates the situation. Security and privacy have emerged as significant challenges for managing IoT. Recent work has demonstrated that deep learning algorithms are very efficient for conducting security analysis of IoT systems and have many advantages compared with the other methods. This paper aims to provide a thorough survey related to deep learning applications in IoT for security and privacy concerns. Our primary focus is on deep learning enhanced IoT security. First, from the view of system architecture and the methodologies used, we investigate applications of deep learning in IoT security. Second, from the security perspective of IoT systems, we analyse the suitability of deep learning to improve security. Finally, we evaluate the performance of deep learning in IoT system security.
Feature vulnerability and robustness assessment against adversarial machine learning attacks

Andrew McCarthy, Panagiotis Andriotis, Essam Ghadafi, and Phil Legg

In 2021 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2021

Abs HTML

Whilst machine learning has been widely adopted for various domains, it is important to consider how such techniques may be susceptible to malicious users through adversarial attacks. Given a trained classifier, a malicious attack may attempt to craft a data observation whereby the data features purposefully trigger the classifier to yield incorrect responses. This has been observed in various image classification tasks, including falsifying road sign detection and facial recognition, which could have severe consequences in real-world deployment. In this work, we investigate how these attacks could impact on network traffic analysis, and how a system could perform misclassification of common network attacks such as DDoS attacks. Using the CICIDS2017 data, we examine how vulnerable the data features used for intrusion detection are to perturbation attacks using FGSM adversarial examples. As a result, our method provides a defensive approach for assessing feature robustness that seeks to balance between classification accuracy whilst minimising the attack surface of the feature space.
“Hacking an IoT Home”: New opportunities for cyber security education combining remote learning with cyber-physical systems

Phil Legg, Thomas Higgs, Pennie Spruhan, Jonathan White, and Ian Johnson

In 2021 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2021

Abs HTML

In March 2020, the COVID-19 pandemic led to a dramatic shift in educational practice, whereby home-schooling and remote working became the norm. Many typical schools outreach projects to encourage uptake of learning cyber security skills therefore were put on hold, due to the inability to physical attend and inspire. In this short paper, we describe a new approach to teaching cyber security with a view of inspiring a new generation of learners to the subject. Traditional Capture-The-Flag exercises are widely used in cyber security education, whereby a series of challenges are completed to gain access and obtain a passphrase from a computer system. We couple this approach with interactive sessions made possible via video conferencing platforms such as Microsoft Teams and Zoom, along with the very nature of being in the home environment, where home IoT devices are now commonplace. We develop an integrated CTF for the home IoT environment, where students can observe the impact of submitting flags via online video, to physical adjust the home environment - ranging from switching off lights, playing music, or controlling an IoT-enabled robot. The result is a highly interactive and engaging experience that benefits from the very nature of remote working, inspiring the notion of "hacking an IoT home".
Unsupervised one-class learning for anomaly detection on home IoT network devices

Jonathan White and Phil Legg

In 2021 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2021

Abs HTML

In this paper we study anomaly detection methods for home IoT devices. Specifically, we address unsupervised one-class learning methods due to their ability to learn deviations from a single normal class. In a home IoT environment, this consideration is crucial as supervised methods would result in a burden on many non-technical consumers which could hinder their effectiveness. For our study, we develop a home IoT network monitoring tool, and we illustrate network attacks against a variety of typical home IoT devices. As a result, we propose measures that could aid home consumers in defending ever-increasing home IoT networks.

2020

The visual design of network data to enhance cyber security awareness of the everyday internet user

Fiona Carroll, Phil Legg, and Bastian Bønkel

In 2020 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2020

Abs HTML

Technology and the use of online services are very prevalent across much of our everyday lives. As our digital interactions continue to grow, there is a need to improve public awareness of the risks to our personal online privacy and security. Designing for cyber security awareness has never been so important. In this work, we consider people’s current impressions towards their privacy and security online. We also explore how abnormal network activity data can be visually conveyed to afford a heightened cyber security awareness. In detail, the paper documents the different effects of visual variables in an edge and node DoS visualisation to depict abnormally high volumes of traffic. The results from two studies show that people are generally becoming more concerned about their privacy and security online. Moreover, we have found that the more focus based visual techniques (i.e. blur) and geometry-based techniques (i.e. jaggedness and sketchiness) afford stronger impressions of uncertainty from abnormally high volumes of network traffic. In terms of security, these impressions and feelings alert in the end-user that something is not quite as it should be and hence develop a heightened cyber security awareness.
Shouting Through Letterboxes: A study on attack susceptibility of voice assistants

Andrew McCarthy, Benedict R Gaster, and Phil Legg

In 2020 International Conference on Cyber Security and Protection of Digital Services (Cyber Security), 2020

Abs HTML

Voice assistants such as Amazon Echo and Google Home have become increasingly popular for many home users, for home automation, entertainment, and convenience. These devices process speech commands from a user to execute some action, such as playing music, making online purchases, or triggering home automation such as lights or security locks. The process of mapping speech input to a text command is performed using a machine learning model. In this study, we explore the concept of how voice assistants could be exploited, where genuine audio commands are manipulated such that an attacker could trigger alternative responses from the voice assistant. We present a small-scale study to examine mis-interpretations made by voice assistants. We also study user perception of how secure their voice devices are, and their approach to security and privacy.
“What did you say?”: Extracting unintentional secrets from predictive text learning systems

Gwyn Wilkinson and Phil Legg

In 2020 International Conference on Cyber Security and Protection of Digital Services (Cyber Security), 2020

Abs HTML

As a primary form of communication, text is used widely for online communications, including e-mail conversations, mobile text messaging, chatroom and forum discussions. Modern systems include facilities such as predictive text, recently implemented using deep learning algorithms, to estimate the next word to be written based on previous historical entries. However, we often enter sensitive information such as passwords using the same input devices - namely, smartphone soft keyboards. In this paper, we explore the problem of deep learning models which memorise sensitive training data, and how secrets can be extracted from predictive text models. We propose a general black-box attack algorithm to accomplish this for all kinds of memorised sequences, discuss mitigations and countermeasures, and explore how this attack vector could be deployed on an Android or iOS mobile device platforms as part of target reconnaissance.
Investigating anti-evasion malware triggers using automated sandbox reconfiguration techniques

Alan Mills and Phil Legg

Journal of Cybersecurity and Privacy, 2020

Abs HTML PDF

Malware analysis is fundamental for defending against prevalent cyber security threats and requires a means to deploy and study behavioural software traits as more sophisticated malware is developed. Traditionally, virtual machines are used to provide an environment that is isolated from production systems so as to not cause any adverse impact on existing infrastructure. Malware developers are fully aware of this and so will often develop evasion techniques to avoid detection within sandbox environments. In this paper, we conduct an investigation of anti-evasion malware triggers for uncovering malware that may attempt to conceal itself when deployed in a traditional sandbox environment. To facilitate our investigation, we developed a tool called MORRIGU that couples together both automated and human-driven analysis for systematic testing of anti-evasion methods using dynamic sandbox reconfiguration techniques. This is further supported by visualisation methods for performing comparative analysis of system activity when malware is deployed under different sandbox configurations. Our study reveals a variety of anti-evasion traits that are shared amongst different malware families, such as sandbox “wear-and-tear”, and Reverse Turing Tests (RTT), as well as more sophisticated malware samples that require multiple anti-evasion checks to be deployed. We also perform a comparative study using Cuckoo sandbox to demonstrate the limitations of adopting only automated analysis tools, to justify the exploratory analysis provided by MORRIGU. By adopting a clearer systematic process for uncovering anti-evasion malware triggers, as supported by tools like MORRIGU, this study helps to further the research of evasive malware analysis so that we can better defend against such future attacks.

2019

What makes for effective visualisation in cyber situational awareness for non-expert users?

Fiona Carroll, Phil Legg, and Adam Chakof

In International conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2019

Abs HTML

As cyber threats continue to become more prevalent, there is a need to consider how best we can understand the cyber landscape when acting online, especially so for non-expert users. Satellite navigation systems provide the de facto standard for many modern day navigation tasks in the physical domain, so we consider the question of how one could navigate the online domain using similar concepts. In this paper, we study the design of a cyber sat nav for improving situational awareness of nonexpert users. We focus on three core tasks: understanding where we are in cyber space, understanding how we got there, and understanding future states that we may traverse to. To support understanding, we explore the use of visualisation techniques to portray complex online activities in clear and engaging formats for non-expert users.
Venue2vec: An efficient embedding model for fine-grained user location prediction in geo-social networks

Shuai Xu, Jiuxin Cao, Phil Legg, Bo Liu, and Shancang Li

IEEE Systems Journal, 2019

Abs HTML

Geo-Social Networks (GSN) significantly improve location-aware capability of services by offering geo-located content based on the huge volumes of data generated in the GSN. The problem of user location prediction based on user-generated data in GSN has been extensively studied. However, existing studies are either concerning predicting users’ next check-in location or predicting their future check-in location at a given time with coarse granularity. A unified model that can predict both scenarios with fine granularity is quite rare. Also, due to the heterogeneity of multiple factors associated with both locations and users, how to efficiently incorporate these information still remains challenging. Inspired by the recent success of word embedding in natural language processing, in this paper, we propose a novel embedding model called Venue2Vec which automatically incorporates temporal-spatial context, semantic information, and sequential relations for fine-grained user location prediction. Locations of the same type, and those that are geographically close or often visited successively by users will be situated closer within the embedding space. Based on our proposed Venue2Vec model, we design techniques that allow for predicting a user’s next check-in location, and also their future check-in location at a given time. We conduct experiments on three real-world GSN datasets to verify the performance of the proposed model. Experimental results on both tasks show that Venue2Vec model outperforms several state-of-the-art models on various evaluation metrics. Furthermore, we show how the Venue2Vec model can be more time-efficient due to being parallelizable.
Tools and techniques for improving cyber situational awareness of targeted phishing attacks

Phil Legg and Tim Blackman

In International conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2019

Abs HTML

Phishing attacks continue to be one of the most common attack vectors used online today to deceive users, such that attackers can obtain unauthorised access or steal sensitive information. Phishing campaigns often vary in their level of sophistication, from mass distribution of generic content, such as delivery notifications, online purchase orders, and claims of winning the lottery, through to bespoke and highly-personalised messages that convincingly impersonate genuine communications (e.g., spearphishing attacks). There is a distinct trade-off here between the scale of an attack versus the effort required to curate content that is likely to convince an individual to carry out an action (typically, clicking a malicious hyperlink). In this short paper, we conduct a preliminary study on a recent realworld incident that strikes a balance between attacking at scale and personalised content. We adopt different visualisation tools and techniques for better assessing the scale and impact of the attack, that can be used both by security professionals to analyse the security incident, but could also be used to inform employees as a form of security awareness and training. We pitched the approach to IT professionals working in information security, who believe this may provide improved awareness of how targeted phishing campaigns can impact an organisation, and could contribute towards a pro-active step of how analysts will examine and mitigate the impact of future attacks across the organisation.
Efficient and interpretable real-time malware detection using random-forest

Alan Mills, Theodoros Spyridopoulos, and Phil Legg

In International conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2019

Abs HTML

Malicious software, often described as malware, is one of the greatest threats to modern computer systems, and attackers continue to develop more sophisticated methods to access and compromise data and resources. Machine learning methods have potential to improve malware detection both in terms of accuracy and detection runtime, and is an active area within academic research and commercial development. Whilst the majority of research focused on improving accuracy and runtime of these systems, to date there has been little focus on the interpretability of detection results. In this paper, we propose a lightweight malware detection system called NODENS that can be deployed on affordable hardware such as a Raspberry Pi. Crucially, NODENS provides transparency of output results so that an end-user can begin to examine why the classifier believes a software sample to be either malicious or benign. Using an efficient Random-Forest approach, our system provides interpretability whilst not sacrificing accuracy or detection runtime, with an average detection speed of between 3–8 seconds, allowing for early remedial action to be taken before damage is caused.
Visual analytics for collaborative human-machine confidence in human-centric active learning tasks

Phil Legg, Jim Smith, and Alexander Downing

Human-centric computing and information sciences, 2019

Abs HTML PDF

Active machine learning is a human-centric paradigm that leverages a small labelled dataset to build an initial weak classifier, that can then be improved over time through human-machine collaboration. As new unlabelled samples are observed, the machine can either provide a prediction, or query a human ’oracle’ when the machine is not confident in its prediction. Of course, just as the machine may lack confidence, the same can also be true of a human ’oracle’: humans are not all-knowing, untiring oracles. A human’s ability to provide an accurate and confident response will often vary between queries, according to the duration of the current interaction, their level of engagement with the system, and the difficulty of the labelling task. This poses an important question of how uncertainty can be expressed and accounted for in a human-machine collaboration. In short, how can we facilitate a mutually-transparent collaboration between two uncertain actors—a person and a machine—that leads to an improved outcome? In this work, we demonstrate the benefit of human-machine collaboration within the process of active learning, where limited data samples are available or where labelling costs are high. To achieve this, we developed a visual analytics tool for active learning that promotes transparency, inspection, understanding and trust, of the learning process through human-machine collaboration. Fundamental to the notion of confidence, both parties can report their level of confidence during active learning tasks using the tool, such that this can be used to inform learning. Human confidence of labels can be accounted for by the machine, the machine can query for samples based on confidence measures, and the machine can report confidence of current predictions to the human, to further the trust and transparency between the collaborative parties. In particular, we find that this can improve the robustness of the classifier when incorrect sample labels are provided, due to unconfidence or fatigue. Reported confidences can also better inform human-machine sample selection in collaborative sampling. Our experimentation compares the impact of different selection strategies for acquiring samples: machine-driven, human-driven, and collaborative selection. We demonstrate how a collaborative approach can improve trust in the model robustness, achieving high accuracy and low user correction, with only limited data sample selections.

2018

Predicting the occurrence of world news events using recurrent neural networks and auto-regressive moving average models

Emmanuel M Smith, Jim Smith, Phil Legg, and Simon Francis

In Advances in Computational Intelligence Systems: Contributions Presented at the 17th UK Workshop on Computational Intelligence, September 6-8, 2017, Cardiff, UK, 2018

Abs HTML

The ability to predict future states is fundamental for a wide variety of applications, from weather forecasting to stock market analysis. Understanding the related data attributes that can influence changes in time series is a challenging task that is critical for making accurate predictions. One particular application of key interest is understanding the factors that relate to the occurrence of global activities from online world news reports. Being able to understand why particular types of events may occur, such as violence and peace, could play a vital role in better protecting and understanding our global society. In this work, we explore the concept of predicting the occurrence of world news events, making use of Global Database of Events, Language and Tone online news aggregation source. We compare traditional Auto-Regressive Moving Average models with more recent deep learning strategies using Long Short-Term Memory Recurrent Neural Networks. Our results show that the latter are capable of achieving lower error rates. We also discuss how deep learning methods such as Recurrent Neural Networks have the potential for greater capability to incorporate complex associations of data attributes that may impact the occurrence of future events.
Visualising state space representations of LSTM networks

Emmanuel M Smith, Jim Smith, Phil Legg, and Simon Francis

In Workshop on Visualization for AI Explainability (VISxAI), 2018

Abs HTML

Long Short-Term Memory (LSTM) networks have proven to be one of the most effective models for making predictions on sequence-based tasks. These models work by capturing, remembering, and forgetting information relevant to their future predictions. The non-linear complexity of the mechanisms involved in this process means we currently lack tools for achieving interpretability. Ideally, we want these models to provide an explanation of why they make a particular prediction, given a specific input. Researchers have explored the idea of interpreting LSTMs in specific contexts such as natural language processing or classification, but they put minimal focus on approaches which are generalisable across different applications. To alleviate this, in this work, we demonstrate a method which enables the interpretation and comparison of LSTM states during time series predictions. We show that by reducing the dimensionality of network states one can scalably visualise patterns and explain model behaviours.
Predicting user confidence during visual decision making

Jim Smith, Phil Legg, Milos Matovic, and Kristofer Kinsey

ACM Transactions on Interactive Intelligent Systems (TiiS), 2018

Abs HTML

People are not infallible consistent “oracles”: their confidence in decision-making may vary significantly between tasks and over time. We have previously reported the benefits of using an interface and algorithms that explicitly captured and exploited users’ confidence: error rates were reduced by up to 50% for an industrial multi-class learning problem; and the number of interactions required in a design-optimisation context was reduced by 33%. Having access to users’ confidence judgements could significantly benefit intelligent interactive systems in industry, in areas such as intelligent tutoring systems and in health care. There are many reasons for wanting to capture information about confidence implicitly. Some are ergonomic, but others are more “social”—such as wishing to understand (and possibly take account of) users’ cognitive state without interrupting them. We investigate the hypothesis that users’ confidence can be accurately predicted from measurements of their behaviour. Eye-tracking systems were used to capture users’ gaze patterns as they undertook a series of visual decision tasks, after each of which they reported their confidence on a 5-point Likert scale. Subsequently, predictive models were built using ”conventional” machine learning approaches for numerical summary features derived from users’ behaviour. We also investigate the extent to which the deep learning paradigm can reduce the need to design features specific to each application by creating ”gaze maps”—visual representations of the trajectories and durations of users’ gaze fixations—and then training deep convolutional networks on these images. Treating the prediction of user confidence as a two-class problem (confident/not confident), we attained classification accuracy of 88% for the scenario of new users on known tasks, and 87% for known users on new tasks. Considering the confidence as an ordinal variable, we produced regression models with a mean absolute error of ≈0.7 in both cases. Capturing just a simple subset of non-task-specific numerical features gave slightly worse, but still quite high accuracy (e.g., MAE ≈ 1.0). Results obtained with gaze maps and convolutional networks are competitive, despite not having access to longer-term information about users and tasks, which was vital for the “summary” feature sets. This suggests that the gaze-map-based approach forms a viable, transferable alternative to handcrafting features for each different application. These results provide significant evidence to confirm our hypothesis, and offer a way of substantially improving many interactive artificial intelligence applications via the addition of cheap non-intrusive hardware and computationally cheap prediction algorithms.

2017

Human-machine decision support systems for insider threat detection

Philip A Legg

Data Analytics and Decision Support for Cybersecurity: Trends, Methodologies and Applications, 2017

Abs HTML

Insider threats are recognised to be quite possibly the most damaging attacks that an organisation could experience. Those on the inside, who have privileged access and knowledge, are already in a position of great responsibility for contributing towards the security and operations of the organisation. Should an individual choose to exploit this privilege, perhaps due to disgruntlement or external coercion from a competitor, then the potential impact to the organisation can be extremely damaging. There are many proposals of using machine learning and anomaly detection techniques as a means of automated decision-making about which insiders are acting in a suspicious or malicious manner, as a form of large scale data analytics. However, it is well recognised that this poses many challenges, for example, how do we capture an accurate representation of normality to assess insiders against, within a dynamic and ever-changing organisation? More recently, there has been interest in how visual analytics can be incorporated with machine-based approaches, to alleviate the data analytics challenges of anomaly detection and to support human reasoning through visual interactive interfaces. Furthermore, by combining visual analytics and active machine learning, there is potential capability for the analysts to impart their domain expert knowledge back to the system, so as to iteratively improve the machine-based decisions based on the human analyst preferences. With this combined human-machine approach to decision-making about potential threats, the system can begin to more accurately capture human rationale for the decision process, and reduce the false positives that are flagged by the system. In this work, I reflect on the challenges of insider threat detection, and look to how human-machine decision support systems can offer solutions towards this.
RicherPicture: Semi-automated cyber defence using context-aware data analytics

Arnau Erola, Ioannis Agrafiotis, Jassim Happa, Michael Goldsmith, Sadie Creese, and Philip A Legg

In 2017 International Conference On Cyber Situational Awareness, Data Analytics And Assessment (Cyber SA), 2017

Abs HTML

In a continually evolving cyber-threat landscape, the detection and prevention of cyber attacks has become a complex task. Technological developments have led organisations to digitise the majority of their operations. This practice, however, has its perils, since cybespace offers a new attack-surface. Institutions which are tasked to protect organisations from these threats utilise mainly network data and their incident response strategy remains oblivious to the needs of the organisation when it comes to protecting operational aspects. This paper presents a system able to combine threat intelligence data, attack-trend data and organisational data (along with other data sources available) in order to achieve automated network-defence actions. Our approach combines machine learning, visual analytics and information from business processes to guide through a decision-making process for a Security Operation Centre environment. We test our system on two synthetic scenarios and show that correlating network data with non-network data for automated network defences is possible and worth investigating further.

2016

Automated registration of multimodal optic disc images: clinical assessment of alignment accuracy

Wai Siene Ng, Phil Legg, Venkat Avadhanam, Kyaw Aye, Steffan HP Evans, Rachel V North, Andrew D Marshall, Paul Rosin, and James E Morgan

Journal of Glaucoma, 2016

Abs HTML

Purpose: To determine the accuracy of automated alignment algorithms for the registration of optic disc images obtained by 2 different modalities: fundus photography and scanning laser tomography. Materials and methods: Images obtained with the Heidelberg Retina Tomograph II and paired photographic optic disc images of 135 eyes were analyzed. Three state-of-the-art automated registration techniques Regional Mutual Information, rigid Feature Neighbourhood Mutual Information (FNMI), and nonrigid FNMI (NRFNMI) were used to align these image pairs. Alignment of each composite picture was assessed on a 5-point grading scale: "Fail" (no alignment of vessels with no vessel contact), "Weak" (vessels have slight contact), "Good" (vessels with <50% contact), "Very Good" (vessels with >50% contact), and "Excellent" (complete alignment). Custom software generated an image mosaic in which the modalities were interleaved as a series of alternate 5×5-pixel blocks. These were graded independently by 3 clinically experienced observers. Results: A total of 810 image pairs were assessed. All 3 registration techniques achieved a score of "Good" or better in >95% of the image sets. NRFNMI had the highest percentage of "Excellent" (mean: 99.6%; range, 95.2% to 99.6%), followed by Regional Mutual Information (mean: 81.6%; range, 86.3% to 78.5%) and FNMI (mean: 73.1%; range, 85.2% to 54.4%). Conclusions: Automated registration of optic disc images by different modalities is a feasible option for clinical application. All 3 methods provided useful levels of alignment, but the NRFNMI technique consistently outperformed the others and is recommended as a practical approach to the automated registration of multimodal disc images.
Enhancing cyber situation awareness for non-expert users using visual analytics

Philip A Legg

In 2016 International Conference On Cyber Situational Awareness, Data Analytics And Assessment (CyberSA), 2016

Abs HTML

Situation awareness is often described as the perception and comprehension of the current situation, and the projection of future status. Whilst this may be understood in an organisational cybersecurity context, there is a strong case to be made for effective cybersecurity situation awareness that is tailored to the needs of the Non-Expert User (NEU). Our online usage habits are rapidly evolving with smartphones and tablets being widely used to access resources online. In order for NEUs to remain safe online, there is a need to enhance awareness and understanding of cybersecurity concerns, such as how devices may be acting online, and what data is being shared between devices. In this paper, we explore the notion of personal situation awareness for NEUs. We conduct a small-scale study to understand how NEUs perceive cybersecurity. We also propose how visual analytics could be used to help encourage NEUs to actively monitor and observe their activity for greater online awareness. The guidance developed through the course of this work can help practitioners develop tools that could help NEUs better understand their online actions, with the aim to result in safer experiences when acting online.
Glyph visualization: A fail-safe design scheme based on quasi-hamming distances

Philip A Legg, Eamonn Maguire, Simon Walton, and Min Chen

IEEE computer graphics and applications, 2016

Abs HTML

In many spatial and temporal visualization applications, glyphs provide an effective means for encoding multivariate data. However, because glyphs are typically small, they are vulnerable to various perceptual errors. This article introduces the concept of a quasi-Hamming distance in the context of glyph design and examines the feasibility of estimating the quasi-Hamming distance between a pair of glyphs and the minimal Hamming distance for a glyph set. The authors demonstrate the design concept by developing a file-system event visualization that can depict the activities of multiple users.
Visual Analytics for Non-Expert Users in Cyber Situation Awareness

Philip A Legg

Int. J. Cyber Situational Aware., 2016

Abs HTML

Situation awareness is often described as the perception and comprehension of the current situation, and the projection of future status. Whilst this may be well understood in an organisational cybersecurity context, there is a strong case to be made for effective cybersecurity situation awareness that is tailored to the needs of the Non-Expert User (NEU). Our online usage habits are rapidly evolving with smartphones and tablets being widely used to access resources online. In order for NEUs to remain safe online, there is a need to enhance awareness and understanding of cybersecurity concerns, such as how devices may be acting online, and what data is being shared between devices. In this paper, we extend our proposal of the Enhanced Personal Situation Awareness (ePSA) framework to consider the key details of cyber situation awareness that would be of concern to NEUs, and we consider how such information can be effectively conveyed using a visual analytic approach. We present the design of our visual analytics approach to show how this can represent the key details of cyber situation awareness whilst maintaining a simple and clean design scheme so as to not result in information-overload for the user. The guidance developed through the course of this work can help practitioners develop tools that could help NEUs better understand their online actions, with the aim of giving users greater control and safer experiences when their personal devices are acting online.

2015

Glyph sorting: Interactive visualization for multi-dimensional data

David HS Chung, Philip A Legg, Matthew L Parry, Rhodri Bown, Iwan W Griffiths, Robert S Laramee, and Min Chen

Information Visualization, 2015

Abs HTML

Glyph-based visualization is an effective tool for depicting multivariate information. Since sorting is one of the most common analytical tasks performed on individual attributes of a multi-dimensional dataset, this motivates the hypothesis that introducing glyph sorting would significantly enhance the usability of glyph-based visualization. In this article, we present a glyph-based conceptual framework as part of a visualization process for interactive sorting of multivariate data. We examine several technical aspects of glyph sorting and provide design principles for developing effective, visually sortable glyphs. Glyphs that are visually sortable provide two key benefits: (1) performing comparative analysis of multiple attributes between glyphs and (2) to support multi-dimensional visual search. We describe a system that incorporates focus and context glyphs to control sorting in a visually intuitive manner and for viewing sorted results in an interactive, multi-dimensional glyph plot that enables users to perform high-dimensional sorting, analyse and examine data trends in detail. To demonstrate the usability of glyph sorting, we present a case study in rugby event analysis for comparing and analysing trends within matches. This work is undertaken in conjunction with a national rugby team. From using glyph sorting, analysts have reported the discovery of new insight beyond traditional match analysis.
Identifying attack patterns for insider threat detection

Ioannis Agrafiotis, Jason RC Nurse, Oliver Buckley, Phil Legg, Sadie Creese, and Michael Goldsmith

Computer Fraud & Security, 2015

Abs HTML

The threat that insiders pose to businesses, institutions and governmental organisations continues to be of serious concern. Recent industry surveys provide unequivocal evidence to support the significance of this threat and its prevalence in enterprises today.1 In an attempt to address this challenge, several approaches and systems have been proposed by practitioners and researchers. These focus on defining the insider threat and exploring the human and psychological factors involved, through to the detection and deterrence of these threats via technological and behavioural theories. Insider threats pose major concerns to businesses, institutions and governmental organisations. Few solutions to this problem consider all the technical, organisational and behavioural aspects. In new research, Ioannis Agrafiotis, Jason RC Nurse, Oliver Buckley, Phil Legg, Sadie Creese and Michael Goldsmith define attack patterns that could be key in assisting insider-threat detection, based on 120 real-world case studies. They present their findings, representing each case study as a series of attack steps and identify common trends between different attacks.
Knowledge-assisted ranking: A visual analytic application for sports event data

David HS Chung, Matthew L Parry, Iwan W Griffiths, Robert S Laramee, Rhodri Bown, Philip A Legg, and Min Chen

IEEE Computer Graphics and Applications, 2015

Abs HTML

Organizing sports video data for performance analysis can be challenging, especially in cases involving multiple attributes and when the criteria for sorting frequently changes depending on the user’s task. The proposed visual analytic system enables users to specify a sort requirement in a flexible manner without depending on specific knowledge about individual sort keys. The authors use regression techniques to train different analytical models for different types of sorting requirements and use visualization to facilitate knowledge discovery at different stages of the process. They demonstrate the system with a rugby case study to find key instances for analyzing team and player performance. Organizing sports video data for performance analysis can be challenging in cases with multiple attributes, and when sorting frequently changes depending on the user’s task. As this video shows, the proposed visual analytic system allows interactive data sorting and exploration. https://youtu.be/Cs6SLtPVDQQ.
Using internet activity profiling for insider-threat detection

Bushra A Alahmadi, Philip A Legg, and Jason RC Nurse

In Special Session on Security in Information Systems, 2015

Abs HTML PDF

The insider-threat problem continues to be a major risk to both public and private sectors, where those people who have privileged knowledge and access choose to abuse this in some way to cause harm towards their organisation. To combat against this, organisations are beginning to invest heavily in deterrence monitoring tools to observe employees’ activity, such as computer access, Internet browsing, and email communications. Whilst such tools may provide some way towards detecting attacks afterwards, what may be more useful is preventative monitoring, where user characteristics and behaviours inform about the possibility of an attack before it happens. Psychological research advocates that the behaviour and preference of a person can be explained to a great extent by psychological constructs called personality traits, which could then possibly indicate the likelihood of an individual being a potential insider threat. By considering how browsing content relates to psychological constructs (such as OCEAN), and how an individual’s browsing behaviour deviates over time, potential insider-threats could be uncovered before significant damage is caused. The main contribution in this paper is to explore how Internet browsing activity could be used to predict the individual’s psychological characteristics in order to detect potential insider-threats. Our results demonstrate that predictive assessment can be made between the content available on a website, and the associated personality traits, which could greatly improve the prospects of preventing insider attacks.
Visualizing the insider threat: challenges and tools for identifying malicious user activity

Philip A Legg

In 2015 IEEE Symposium on Visualization for Cyber Security (VizSec), 2015

Abs HTML

One of the greatest challenges for managing organisational cyber security is the threat that comes from those who operate within the organisation. With entitled access and knowledge of organisational processes, insiders who choose to attack have the potential to cause serious impact, such as financial loss, reputational damage, and in severe cases, could even threaten the existence of the organisation. Security analysts therefore require sophisticated tools that allow them to explore and identify user activity that could be indicative of an imminent threat to the organisation. In this work, we discuss the challenges associated with identifying insider threat activity, along with the tools that can help to combat this problem. We present a visual analytics approach that incorporates multiple views, including a user selection tool that indicates anomalous behaviour, an interactive Principal Component Analysis (iPCA) tool that aids the analyst to assess the reasoning behind the anomaly detection results, and an activity plot that visualizes user and role activity over time. We demonstrate our approach using the Carnegie Mellon University CERT Insider Threat Dataset to show how the visual analytics workflow supports the Information-Seeking mantra.
Quasi-Hamming distances: An overarching concept for measuring glyph similarity

Philip A Legg, Eamonn Maguire, Simon Walton, and Min Chen

Computer Graphics and Visual Computing: Extended Abstract, 2015

Abs HTML PDF

In many applications of spatial or temporal visualization, glyphs provide an effective means for encoding multivariate data objects. However, because glyphs are typically small, they are vulnerable to various perceptual errors. In data communication, the concept of Hamming distance underpins the study of codes that support error detection and correction by the receiver without the need for corroboration from the sender. In this extended abstract, we outline a novel concept of quasi-Hamming distance in the context of glyph design. We discuss the feasibility of estimating quasi-Hamming distance between a pair of glyphs, and the minimal Hamming distance for a glyph set. This measurement enables glyph designers to determine the differentiability between glyphs, facilitating design optimization by maximizing distances between glyphs under various constraints (e.g., the available number of visual channels and their encoding bandwidth).
Caught in the act of an insider attack: detection and assessment of insider threat

Philip A Legg, Oliver Buckley, Michael Goldsmith, and Sadie Creese

In 2015 IEEE International Symposium on Technologies for Homeland Security (HST), 2015

Abs HTML

The greatest asset that any organisation has are its people, but they may also be the greatest threat. Those who are within the organisation may have authorised access to vast amounts of sensitive company records that are essential for maintaining competitiveness and market position, and knowledge of information services and procedures that are crucial for daily operations. In many cases, those who have such access do indeed require it in order to conduct their expected workload. However, should an individual choose to act against the organisation, then with their privileged access and their extensive knowledge, they are well positioned to cause serious damage. Insider threat is becoming a serious and increasing concern for many organisations, with those who have fallen victim to such attacks suffering significant damages including financial and reputational. It is clear then, that there is a desperate need for more effective tools for detecting the presence of insider threats and analyzing the potential of threats before they escalate. We propose Corporate Insider Threat Detection (CITD), an anomaly detection system that is the result of a multi-disciplinary research project that incorporates technical and behavioural activities to assess the threat posed by individuals. The system identifies user and role-based profiles, and measures how users deviate from their observed behaviours to assess the potential threat that a series of activities may pose. In this paper, we present an overview of the system and describe the concept of operations and practicalities of deploying the system. We show how the system can be utilised for unsupervised detection, and also how the human analyst can engage to provide an active learning feedback loop. By adopting an accept or reject scheme, the analyst is capable of refining the underlying detection model to better support their decisionmaking process and significant reduce the false positive rate.
Feature neighbourhood mutual information for multi-modal image registration: an application to eye fundus imaging

Philip A Legg, Paul L Rosin, Dave Marshall, and James E Morgan

Pattern Recognition, 2015

Abs HTML

Multi-modal image registration is becoming an increasingly powerful tool for medical diagnosis and treatment. The combination of different image modalities facilitates much greater understanding of the underlying condition, resulting in improved patient care. Mutual Information is a popular image similarity measure for performing multi-modal image registration. However, it is recognised that there are limitations with the technique that can compromise the accuracy of the registration, such as the lack of spatial information that is accounted for by the similarity measure. In this paper, we present a two-stage non-rigid registration process using a novel similarity measure, Feature Neighbourhood Mutual Information. The similarity measure efficiently incorporates both spatial and structural image properties that are not traditionally considered by MI. By incorporating such features, we find that this method is capable of achieving much greater registration accuracy when compared to existing methods, whilst also achieving efficient computational runtime. To demonstrate our method, we use a challenging medical image data set consisting of paired retinal fundus photographs and confocal scanning laser ophthalmoscope images. Accurate registration of these image pairs facilitates improved clinical diagnosis, and can be used for the early detection and prevention of glaucoma disease.
Automated insider threat detection system using user and role-based profile assessment

Philip A Legg, Oliver Buckley, Michael Goldsmith, and Sadie Creese

IEEE Systems Journal, 2015

Abs HTML

Organizations are experiencing an ever-growing concern of how to identify and defend against insider threats. Those who have authorized access to sensitive organizational data are placed in a position of power that could well be abused and could cause significant damage to an organization. This could range from financial theft and intellectual property theft to the destruction of property and business reputation. Traditional intrusion detection systems are neither designed nor capable of identifying those who act maliciously within an organization. In this paper, we describe an automated system that is capable of detecting insider threats within an organization. We define a tree-structure profiling approach that incorporates the details of activities conducted by each user and each job role and then use this to obtain a consistent representation of features that provide a rich description of the user’s behavior. Deviation can be assessed based on the amount of variance that each user exhibits across multiple attributes, compared against their peers. We have performed experimentation using ten synthetic data-driven scenarios and found that the system can identify anomalous behavior that may be indicative of a potential threat. We also show how our detection system can be combined with visual analytics tools to support further investigation by an analyst.

2014

Systematic snooker skills test to analyze player performance

David HS Chung, Iwan W Griffiths, Phil A Legg, Matthew L Parry, A Morris, Min Chen, W Griffiths, and Alex Thomas

International Journal of Sports Science & Coaching, 2014

Abs HTML

The process of rigorous training and coaching is one that is essential to any sports player aiming to develop their abilities further. From the novice player through to professional athletes, it is vital to maintain and assess their level of performance in order to progress to a higher standard. However, traditional practice routines can often be non-strategic and devised with an “ad-hoc” approach. In order for a training regime to be beneficial to a player, methods to examine a player’s performance are desirable and can offer quantifiable feedback that will help the player to understand their current weaknesses and provide a benchmark to improve upon. This article focuses on the introduction of a systematic skills test. We assess the fundamental physics of snooker and from this we identify a set of key skills that characterises the basis of all snooker shots. We present 5 snooker tests that can be used to quantify the performance of these key skills. This allows us to analyse snooker players in an objective manner based on their level of ability for each key skill. The article concludes with a user study that assesses the performance of novice, intermediate and professional players when performing our proposed snooker skills test, which demonstrates the ability to make accurate comparison between players of different ability.
Reflecting on the ability of enterprise security policy to address accidental insider threat

Oliver Buckley, Jason RC Nurse, Philip A Legg, Michael Goldsmith, and Sadie Creese

In 2014 Workshop on Socio-Technical Aspects in Security and Trust, 2014

Abs HTML

An enterprise’s information security policy is an exceptionally important control as it provides the employees of an organisation with details of what is expected of them, and what they can expect from the organisation’s security teams, as well as informing the culture within that organisation. The threat from accidental insiders is a reality across all enterprises and can be extremely damaging to the systems, data and reputation of an organisation. Recent industry reports and academic literature underline the fact that the risk of accidental insider compromise is potentially more pressing than that posed by a malicious insider. In this paper we focus on the ability of enterprises’ information security policies to mitigate the accidental insider threat. Specifically we perform an analysis of real-world cases of accidental insider threat to define the key reasons, actions and impacts of these events – captured as a grounded insider threat classification scheme. This scheme is then used to performa review of a set of organisational security policies to highlight their strengths and weaknesses when considering the prevention of incidents of accidental insider compromise. We present a set of questions that can be used to analyse an existing security policy to help control the risk of the accidental insider threat.
Guest Editorial: Emerging Trends in Research for Insider Threat Detection.

William R Claycomb, Philip A Legg, and Dieter Gollmann

J. Wirel. Mob. Networks Ubiquitous Comput. Dependable Appl., 2014

Abs HTML PDF

The insider threat is one of mankind’s most enduring security challenges. For as long as people have placed trust in one other, they have faced the risk of that trust being violated. Historically, consequences of insider attacks included compromised organizational security, financial loss, and risks to human health and safety. Prior to the information age, attacks mainly targeted tangible assets, such as people or money; now insider attacks target additional assets related to information technology (IT), such as data and systems. For instance, malicious insiders may steal intellectual property, sabotage corporate IT systems, or use IT systems to commit financial fraud. Insider attacks have plagued humanity for millennia, and researchers and security professionals continue to struggle to fully understand the breadth of the problem and to propose solutions proven to have measurable effects on reducing the occurrence and impact of attacks. Even defining “insider threat” can be problematic, depending on the problem space.
Understanding insider threat: A framework for characterising attacks

Jason RC Nurse, Oliver Buckley, Philip A Legg, Michael Goldsmith, Sadie Creese, Gordon RT Wright, and Monica Whitty

In 2014 IEEE security and privacy workshops, 2014

Abs HTML

The threat that insiders pose to businesses, institutions and governmental organisations continues to be of serious concern. Recent industry surveys and academic literature provide unequivocal evidence to support the significance of this threat and its prevalence. Despite this, however, there is still no unifying framework to fully characterise insider attacks and to facilitate an understanding of the problem, its many components and how they all fit together. In this paper, we focus on this challenge and put forward a grounded framework for understanding and reflecting on the threat that insiders pose. Specifically, we propose a novel conceptualisation that is heavily grounded in insider-threat case studies, existing literature and relevant psychological theory. The framework identifies several key elements within the problem space, concentrating not only on noteworthy events and indicators- technical and behavioural- of potential attacks, but also on attackers (e.g., the motivation behind malicious threats and the human factors related to unintentional ones), and on the range of attacks being witnessed. The real value of our framework is in its emphasis on bringing together and defining clearly the various aspects of insider threat, all based on real-world cases and pertinent literature. This can therefore act as a platform for general understanding of the threat, and also for reflection, modelling past attacks and looking for useful patterns.
Towards a User and Role-based Sequential Behavioural Analysis Tool for Insider Threat Detection.

Ioannis Agrafiotis, Philip A Legg, Michael Goldsmith, and Sadie Creese

J. Internet Serv. Inf. Secur., 2014

Abs HTML PDF

Insider threat is recognised to be a significant problem and of great concern to both corporations and governments alike. Traditional intrusion detection systems are known to be ineffective due to the extensive knowledge and capability that insiders typically have regarding the organisational setup. Instead, more sophisticated measures are required to analyse the actions performed by those within the organisation, to assess whether their actions suggest that they pose a threat. In this paper, we propose a proof-of-concept that focuses on the use of activity trees to establish sequential-based analysis of employee behaviour. This concept combines the notions of previously-proposed techniques such as attack trees and behaviour trees. For a given employee, we define a tree that can represent all sequences of their observed behaviours. Over time, branches are either appended or created to reflect the new observations that are made on how the employee acts. We also incorporate a similarity measure to establish how different branches compare against each other. Attacks can be defined as where the similarity measure between a newly-observed branch and all existing branches is below a given acceptance criteria. The approach would allow an analyst to observe chains of events that result in low probability activities that could be deemed as unusual and therefore may be malicious. We demonstrate our proof-of-concept using third-party synthetic employee activity logs, to illustrate the practicalities of delivering this form of protective monitoring.
A critical reflection on the threat from human insiders–its nature, industry perceptions, and detection approaches

Jason RC Nurse, Philip A Legg, Oliver Buckley, Ioannis Agrafiotis, Gordon Wright, Monica Whitty, David Upton, Michael Goldsmith, and Sadie Creese

In Human Aspects of Information Security, Privacy, and Trust: Second International Conference, HAS 2014, Held as Part of HCI International 2014, Heraklion, Crete, Greece, June 22-27, 2014. Proceedings 2, 2014

Abs HTML

Organisations today operate in a world fraught with threats, including “script kiddies”, hackers, hacktivists and advanced persistent threats. Although these threats can be harmful to an enterprise, a potentially more devastating and anecdotally more likely threat is that of the malicious insider. These trusted individuals have access to valuable company systems and data, and are well placed to undermine security measures and to attack their employers. In this paper, we engage in a critical reflection on the insider threat in order to better understand the nature of attacks, associated human factors, perceptions of threats, and detection approaches. We differentiate our work from other contributions by moving away from a purely academic perspective, and instead focus on distilling industrial reports (i.e., those that capture practitioners’ experiences and feedback) and case studies in order to truly appreciate how insider attacks occur in practice and how viable preventative solutions may be developed.
Deep Focus: Increasing User ”Depth of Field” to Improve Threat Detection (Oxford workshop poster)

William R Claycomb, Roy Maxion, Jason Clark, Bronwyn Woods, Brian Lindauer, David Jensen, Joshua Neil, Alex Kent, Sadie Creese, and Phil Legg

2014

Abs HTML

We believe insider threat detection methods can be improved by monitoring and analyzing features of user behaviour not typically associated with indicators of malicious insider behaviour. Anomalous behaviours and statistical outliers observed in such data sets may identify new indicators or help reduce high false positive rates associated with existing indicators.
Visual analytics of e-mail sociolinguistics for user behavioural analysis

Philip A Legg, Oliver Buckley, Michael Goldsmith, and Sadie Creese

J. Internet Serv. Inf. Secur., 2014

Abs HTML PDF

The cyber-security threat that most organisations face is not one that only resides outside their perimeter attempting to get in, but emanates from the inside too. Insider threats encompass anyone or thing which exploits authorised access to company information and resources to steal, corrupt or disrupt assets. Threat actors could include not only employees, but also contractors, trusted partners and in some cases clients. The nature of their access is usually persistent, as it is valid and required to conduct their roles, and as such, abuse of their privileges can pose a serious and real threat to the successful operation of the business. Whilst measures have been proposed for detecting previous attacks or those currently in progress, what would be much more desirable is to detect employees who are possibly becoming vulnerable to coercion or persuasion into conducting an attack of some form – enabling supportive or preventative action by the organisation to avoid escalation of an attack. Research into psychology and behaviour is indicating that it may be possible to detect such human vulnerability through analysis of language used – linguistics. In this paper we present a visual analyt-ics tool for the assessment of sociolinguistic behaviours exhibited via e-mail communications, aimed at helping to identify people who are potentially at risk. We discuss the visual designs choices made to provide both detail and overview for the analyst for studying communications within a large group of users, and demonstrate this for a large real-world dataset of over 600 employees. We show how an analyst can use the tool to construct linguistic behavioural models to identify vulnerable employees. We propose that this approach could support wider insider threat prevention and detection systems.

2013

Force-directed parallel coordinates

Rick Walker, Philip A Legg, Serban Pop, Zhao Geng, Robert S Laramee, and Jonathan C Roberts

In 2013 17th International Conference on Information Visualisation, 2013

Abs HTML

Parallel coordinates are a well-known and valuable technique for the analysis and visualization of high dimensional data sets. However, while Inselberg emphasizes that the strength of parallel coordinates as a methodology is rooted in exploration and interactivity, the set of interaction techniques is currently limited. Axes can be re-ordered and brushing (simple, angular or multi-dimensional) can be performed. In this paper, we propose a force-directed algorithm and related interaction techniques to support the exploration of parallel coordinate plots through a physical metaphor. Our parallel-coordinates visualization offers novel user interaction beyond the standard techniques by allowing the user to rotate the axis according to forcedirected polylines. The new interaction provides the user with a more immersive experience for data exploration that results in greater intuition of the data, especially in cases where many polylines overlap. We demonstrate our approach, then present the results of a qualitative evaluation of the system.
Transformation of an uncertain video search pipeline to a sketch-based visual analytics loop

Philip A Legg, David HS Chung, Matthew L Parry, Rhodri Bown, Mark W Jones, Iwan W Griffiths, and Min Chen

IEEE transactions on Visualization and Computer Graphics, 2013

Abs HTML

Traditional sketch-based image or video search systems rely on machine learning concepts as their core technology. However, in many applications, machine learning alone is impractical since videos may not be semantically annotated sufficiently, there may be a lack of suitable training data, and the search requirements of the user may frequently change for different tasks. In this work, we develop a visual analytics systems that overcomes the shortcomings of the traditional approach. We make use of a sketch-based interface to enable users to specify search requirement in a flexible manner without depending on semantic annotation. We employ active machine learning to train different analytical models for different types of search requirements. We use visualization to facilitate knowledge discovery at the different stages of visual analytics. This includes visualizing the parameter space of the trained model, visualizing the search space to support interactive browsing, visualizing candidature search results to support rapid interaction for active learning while minimizing watching videos, and visualizing aggregated information of the search results. We demonstrate the system for searching spatiotemporal attributes from sports video to identify key instances of the team and player performance.
Visual analytics for multivariate sorting of sport event data

D Chung, P Legg, M Parry, I Griffiths, R Brown, R Laramee, and M Chen

In Workshop on sports data visualization, 2013

Abs HTML PDF

A critical job coaches and sport analysts are tasked with is the planning of key match videos for analytical coaching sessions. Each session may focus on a diverse range of topics, such as the strengths and weaknesses of a game. This needs to be tailored further based on a player’s tactical position or skill. Hence, the criteria for sorting video is dynamic. This motivates a sorting criteria beyond individual attributes of a multi-dimensional data set. We propose a knowledge-assisted, event ranking framework to interactively model implicit sorting as formal parameters that can be used to perform multivariate sorting. We incorporate knowledge in the form of a user’s event ranking which we formalize using regression analysis. Depending on the ranking criteria, the resulting function can be customized to many forms such as importance, or other performance metrics. We use visual analytic methods to depict the set of sortable attributes and weights determined by the model. Visual feedback helps the user comprehend the function, and aids in choosing the most appropriate model. We find that this approach significantly increases the usability of multivariate sorting and allows domain experts to incorporate their knowledge and expertise into the analysis. This work is undertaken in conjunction with a national rugby team. To demonstrate the effectiveness of our sorting system, we present a use case scenario in rugby event analysis, where coaches and analysts need to re-organize match videos in order to study and evaluate team and player performance.
Improving accuracy and efficiency of mutual information for multi-modal retinal image registration using adaptive probability density estimation

Philip A Legg, Paul L Rosin, David Marshall, and James E Morgan

Computerized Medical Imaging and Graphics, 2013

Abs HTML

Mutual information (MI) is a popular similarity measure for performing image registration between different modalities. MI makes a statistical comparison between two images by computing the entropy from the probability distribution of the data. Therefore, to obtain an accurate registration it is important to have an accurate estimation of the true underlying probability distribution. Within the statistics literature, many methods have been proposed for finding the ’optimal’ probability density, with the aim of improving the estimation by means of optimal histogram bin size selection. This provokes the common question of how many bins should actually be used when constructing a histogram. There is no definitive answer to this. This question itself has received little attention in the MI literature, and yet this issue is critical to the effectiveness of the algorithm. The purpose of this paper is to highlight this fundamental element of the MI algorithm. We present a comprehensive study that introduces methods from statistics literature and incorporates these for image registration. We demonstrate this work for registration of multi-modal retinal images: colour fundus photographs and scanning laser ophthalmoscope images. The registration of these modalities offers significant enhancement to early glaucoma detection, however traditional registration techniques fail to perform sufficiently well. We find that adaptive probability density estimation heavily impacts on registration accuracy and runtime, improving over traditional binning techniques.
Turn costs change the value of animal search paths

RP Wilson, IW Griffiths, PA Legg, MI Friswell, OR Bidder, LG Halsey, Sergio Agustin Lambertucci, and ELC Shepard

Ecology letters, 2013

Abs HTML

The tortuosity of the track taken by an animal searching for food profoundly affects search efficiency, which should be optimised to maximise net energy gain. Models examining this generally describe movement as a series of straight steps interspaced by turns, and implicitly assume no turn costs. We used both empirical- and modelling-based approaches to show that the energetic costs for turns in both terrestrial and aerial locomotion are substantial, which calls into question the value of conventional movement models such as correlated random walk or Lévy walk for assessing optimum path types. We show how, because straight-line travel is energetically most efficient, search strategies should favour constrained turn angles, with uninformed foragers continuing in straight lines unless the potential benefits of turning offset the cost.
Towards a conceptual model and reasoning structure for insider threat detection

Philip A Legg, Nick Moffat, Jason RC Nurse, Jassim Happa, Ioannis Agrafiotis, Michael Goldsmith, and Sadie Creese

Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications, 2013

Abs HTML PDF

The insider threat faced by corporations and governments today is a real and significant problem, and one that has become increasingly difficult to combat as the years have progressed. From a technology standpoint, traditional protective measures such as intrusion detection systems are largely inadequate given the nature of the ’insider’ and their legitimate access to prized organisational data and assets. As a result, it is necessary to research and develop more sophisticated approaches for the accurate recognition, detection and response to insider threats. One way in which this may be achieved is by understanding the complete picture of why an insider may initiate an attack, and the indicative elements along the attack chain. This includes the use of behavioural and psychological observations about a potential malicious insider in addition to technological monitoring and profiling techniques. In this paper, we propose a framework for modelling the insider-threat problem that goes beyond traditional technological observations and incorporates a more complete view of insider threats, common precursors, and human actions and behaviours. We present a conceptual model for insider threat and a reasoning structure that allows an analyst to make or draw hypotheses regarding a potential insider threat based on measurable states from real-world observations.
Video Visualization

Rita Borgo, Min Chen, Markus Höferlin, Kuno Kurzhals, Phil Legg, Simon Walton, and Daniel Weiskopf

2013

Abs DOI HTML

Video data, generated by the entertainment industry, security and traffic cameras, video conferencing systems, video emails, and so on, is particularly time-consuming to process by human beings. The field of visualization has provided this challenging problem with a collection of techniques that transform videos to different visual forms in order to reduce the time required to watch the video. In this tutorial, we will introduce the concept of video visualization, and several elementary techniques for processing and rendering a video into a compact visual representation. We will describe a family of visual representations, a set of insight obtained from empirical studies, and a collection of applications.

2012

Visualizing multiple error-sensitivity fields for single camera positioning

David HS Chung, Matthew L Parry, Philip A Legg, Iwan W Griffiths, Robert S Laramee, and Min Chen

Computing and Visualization in Science, 2012

Abs HTML

In many data acquisition tasks, the placement of a real camera can vary significantly in complexity from one scene to another. Optimal camera positioning should be governed not only by least error sensitivity, but in addition to real-world practicalities given by various physical, financial and other types of constraints. It would be a laborious and costly task to model all these constraints if one were to rely solely on fully automatic algorithms to make the decision. In this work, we present a study using 2D and 3D visualization methods to assist in single camera positioning based on error sensitivity of reconstruction and other physical and financial constraints. We develop a collection of visual mappings that depict the composition of multiple error sensitivity fields that occur for a given camera position. Each camera position is then mapped to a 3D visualization that enables visual assessment of the camera configuration. We find that the combined 2D and 3D visualization effectively aids the estimation of camera placement without the need for extensive manual configuration through trial and error. Importantly, it still provides the user with sufficient flexibility to make dynamic decisions based on physical and financial constraints that can not be encoded easily in an algorithm. We demonstrate the utility of our system on two real-world applications namely snooker analysis and camera surveillance.
MatchPad: interactive glyph-based visualization for real-time sports performance analysis

Philip A Legg, David HS Chung, Matthew L Parry, Mark W Jones, Rhys Long, Iwan W Griffiths, and Min Chen

In Computer graphics forum, 2012

Abs HTML

Today real-time sports performance analysis is a crucial aspect of matches in many major sports. For example, in soccer and rugby, team analysts may annotate videos during the matches by tagging specific actions and events, which typically result in some summary statistics and a large spreadsheet of recorded actions and events. To a coach, the summary statistics (e.g., the percentage of ball possession) lacks sufficient details, while reading the spreadsheet is time-consuming and making decisions based on the spreadsheet in real-time is thereby impossible. In this paper, we present a visualization solution to the current problem in real-time sports performance analysis. We adopt a glyph-based visual design to enable coaching staff and analysts to visualize actions and events “at a glance”. We discuss the relative merits of metaphoric glyphs in comparison with other types of glyph designs in this particular application. We describe an algorithm for managing the glyph layout at different spatial scales in interactive visualization. We demonstrate the use of this technical approach through its application in rugby, for which we delivered the visualization software, MatchPad, on a tablet computer. The MatchPad was used by the Welsh Rugby Union during the Rugby World Cup 2011. It successfully helped coaching staff and team analysts to examine actions and events in detail whilst maintaining a clear overview of the match, and assisted in their decision making during the matches. It also allows coaches to convey crucial information back to the players in a visually-engaging manner to help improve their performance.

2011

From Video to Animated 3D Reconstruction: A Computer Graphics Application for Snooker Skills Training.

Philip A Legg, Matthew L Parry, David HS Chung, Richard M Jiang, Adrian Morris, Iwan W Griffiths, A David Marshall, and Min Chen

In Eurographics (Posters), 2011

Abs HTML

This poster will present a computer graphics application for improving snooker skills training. We developed an automated modelling and rendering pipeline that converts video input data to a time-varying 3D graphical model that can be animated from arbitrary viewing positions. In addition, we introduced illustrative rendering capability that provides coaches and players with various annotated graphics as training aids. The reconstruction of 3D models relies only on a single camera view.
Intelligent filtering by semantic importance for single-view 3D reconstruction from Snooker video

Philip A Legg, Matthew L Parry, David HS Chung, Richard M Jiang, Adrian Morris, Iwan W Griffiths, David Marshall, and Min Chen

In 2011 18th IEEE international conference on image processing, 2011

Abs HTML

In this paper we investigate the challenge of 3D reconstruction from Snooker video data. We propose a system pipeline for intelligent filtering based on semantic importance in Snooker. The system can be divided into table detection and correction, followed by ball detection, classification and tracking. It is apparent from previous work that there are several challenges presented here. Firstly, previous methods tend to use a fixed top-down camera mounted above the table. To capture a full table view from this is challenging due to space limitations above the table. Instead, we capture video data from a tripod and correct the viewpoint through processing. Secondly, previous methods tend to simply detect the balls without considering other interfering objects such as player and cue. This becomes even more apparent when the player strikes the cue ball. Our intelligent filtering avoids such issues to give accurate 3D table reconstruction.
Hierarchical event selection for video storyboards with a case study on snooker video visualization

Matthew L Parry, Philip A Legg, David HS Chung, Iwan W Griffiths, and Min Chen

IEEE Transactions on Visualization and Computer Graphics, 2011

Abs HTML

Video storyboard, which is a form of video visualization, summarizes the major events in a video using illustrative visualization. There are three main technical challenges in creating a video storyboard, (a) event classification, (b) event selection and (c) event illustration. Among these challenges, (a) is highly application-dependent and requires a significant amount of application specific semantics to be encoded in a system or manually specified by users. This paper focuses on challenges (b) and (c). In particular, we present a framework for hierarchical event representation, and an importance-based selection algorithm for supporting the creation of a video storyboard from a video. We consider the storyboard to be an event summarization for the whole video, whilst each individual illustration on the board is also an event summarization but for a smaller time window. We utilized a 3D visualization template for depicting and annotating events in illustrations. To demonstrate the concepts and algorithms developed, we use Snooker video visualization as a case study, because it has a concrete and agreeable set of semantic definitions for events and can make use of existing techniques of event detection and 3D reconstruction in a reliable manner. Nevertheless, most of our concepts and algorithms developed for challenges (b) and (c) can be applied to other application areas.

2010

Multimodal retinal imaging: Improving accuracy and efficiency of image registration using Mutual Information

Philip A Legg

Cardiff University (United Kingdom), 2010

Abs HTML

This thesis addresses the challenging task of multi-modal image registration. Registration is often required in a number of applications, whereby two images are aligned to give matching correspondence between the features in each image. Such techniques have become popular in many different fields, especially in medical imaging. Multi-modal registration would allow for anatomical structure to be studied concurrently in both modalities, providing the clinician with a greater insight of the patient’s condition. Glaucoma is a serious condition that damages the optic nerve progressively, leading to irreversible blindness. The disease can be treated so to prevent any further infection, however it can not be reversed. Therefore it is paramount that the disease is detected in the early stages so to minimise the affect of the condition. The work in this thesis focuses on two particular imaging modalities: colour fundus photographs and scanning laser ophthalmoscope images. Both images are captured from the human eye and show the appearance and reflectivity of the retina respectively. Registration of these two modalities would significantly improve demarcation and monitoring of the optic nerve head, a crucial stage for glaucoma diagnosis. In recent years, Mutual Information has become a popular technique used to perform multi-modal registration. This thesis provides a comprehensive overview of the algorithm. Firstly, an investigation is performed that shows how probability estimation can improve the algorithm performance. Secondly, the weaknesses of the current technique are revealed and so a novel solution is proposed that overcomes these problems. Finally, the proposed solution is incorporated in a non-rigid registration scheme that provides excellent registration accuracy for our intended application.

2009

A robust solution to multi-modal image registration by combining mutual information with multi-scale derivatives

Philip A Legg, Paul L Rosin, David Marshall, and James E Morgan

In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2009: 12th International Conference, London, UK, September 20-24, 2009, Proceedings, Part I 12, 2009

Abs HTML

In this paper we present a novel method for performing image registration of different modalities. Mutual Information (MI) is an established method for performing such registration. However, it is recognised that standard MI is not without some problems, in particular it does not utilise spatial information within the images. Various modifications have been proposed to resolve this, however these only offer slight improvement to the accuracy of registration. We present Feature Neighbourhood Mutual Information (FNMI) that combines both image structure and spatial neighbourhood information which is efficiently incorporated into Mutual Information by approximating the joint distribution with a covariance matrix (c.f. Russakoff’s Regional Mutual Information). Results show that our approach offers a very high level of accuracy that improves greatly on previous methods. In comparison to Regional MI, our method also improves runtime for more demanding registration problems where a higher neighbourhood radius is required. We demonstrate our method using retinal fundus photographs and scanning laser ophthalmoscopy images, two modalities that have received little attention in registration literature. Registration of these images would improve accuracy when performing demarcation of the optic nerve head for detecting such diseases as glaucoma.
Non-rigid elastic registration of retinal images using local window mutual information

Philip A Legg, Paul L Rosin, David Marshall, and James E Morgan

Proc. Med. Image Understand. Anal. (MIUA 2009), 2009

Abs HTML

In this paper we consider the problem of non-rigid retinal image registration between colour fundus photographs and Scanning Laser Ophthalmoscope (SLO) images. Registration would allow for cross-comparison between modalities, giving both appearence and reflectivity information which would provide clearer visualisation for demarcation of the optic nerve head as part of early glaucoma detection. Due to the differences in acquisition technique, along with alterations in the eye between acquisitions, there can be subtle non-rigid deformations present in the images that become apparent when performing rigid registration. Whilst this is negligible towards the centre of the SLO, the effect becomes much more noticable towards the periphery of the image, where it can be seen that not all blood vessels are aligned correctly. We propose a two-stage registration consisting of finding an initial rigid registration using Feature Neighbourhood Mutual Information [1], and then to use Local Window Mutual Information to quickly determine deformation parameters for a non-rigid solution. We test our method on 135 image pairs, with results showing improved registration accuracy compared to rigid registration.

2008

Incorporating neighbourhood feature derivatives with mutual information to improve accuracy of multi-modal image registration

Philip A Legg, Paul L Rosin, David Marshall, and James E Morgan

Proc. Med. Image Understand. Anal. (MIUA 2008), 2008

Abs HTML PDF

In this paper we present an improved method for performing image registration of different modalities. Russakoff proposed the method of Regional Mutual Information (RMI) which allows neighbourhood information to be considered in the Mutual Information (MI) algorithm. We extend this method by taking local multi-scale feature derivatives in a gauge coordinate frame to represent the structural information of the images. By incorporatingthese images into RMI, we can combine aspects of both structural and neighbourhood information together, which provides a high level of registration accuracy that is essential in application to the medical domain. Our images to be registered are retinal fundus photographs and SLO (Scanning Laser Ophthalmoscopy) images. The combination of these two modalities has received little attention in image registration, yet could provide much useful information to an Ophthalmic clinician. One application is the detection of glaucoma in its early stages, where prevention of further infection is possible before irreversible damage occurs. Results indicate that our method offers a vast improvement to Regional MI, with 25 of our 26 test images being registered to a high standard.

2007

Improving accuracy and efficiency of registration by mutual information using Sturges’ histogram rule

Philip A Legg, Paul L Rosin, David Marshall, and James E Morgan

Proc. Med. Image Understand. Anal. (MIUA 2007), 2007

Abs HTML

Mutual Information is a common technique for image registration in the medical domain, in particular where images of different modalities are to be registered. In this paper, we wish to demonstrate the benefits of applying a common method known in statistics as Sturges’ Rule for selecting histogram bin size when computing Entropy as a part of the existing Mutual Information algorithm. Although Sturges’ Rule is well known in the field of statistics it has received little attention in the Computer Vision community. By augmenting Mutual Information with Sturges’ Rule, we show that this offers an improvement to both the runtime of the algorithm and also the accuracy of the registration. Our results are demonstrated on images of the eye, in particular, Fundus images and SLO (Scanning Laser Ophthalmoscopy) images.

Explore key words via an interactive word cloud.