Cyber Funfair: Creating Immersive & Educational Experiences for Teaching Cyber Physical Systems Security
Alan Mills, Jonathan White and Phil Legg
Technical Symposium on Computer Science Education (SIGCSE TS) 2024
Delivering meaningful & inspiring cyber security education for younger audiences can often be a challenge due to limited expertise & resources. Key to any outreach activity is that it both develops a learner's curiosity, as well as providing educational objectives. To address this need, we developed a novel learning & awareness activity that addresses the Cyber Physical Systems (CPS) Security knowledge area as mapped by the Cyber Security Body of Knowledge (CyBOK). At the core of our activity is the integration of the Raspberry Pi device with LEGO SPIKE kits. LEGO SPIKE is part of the LEGO Education system that combines colourful LEGO building blocks with motors & sensors, creating an adaptable & engaging learning environment. This hands-on approach allows participants to witness the tangible consequences of cyber & network actions in a physical & engaging format. To evaluate the effectiveness of the activity, we used the activity as part of an outreach activity day attended by approximately 300 students aged between 12-14 from schools across the West of England. Participants of the activity were surveyed & the results showed an increase in understanding of CPS specific \& wider cyber security for over 90% of respondents. Activity engagement was also well received with no negative feedback. We report on our survey findings & discuss best practices to support other practitioners in developing hands-on interactive experiences for engaging & educational cyber security activities.


Longitudinal risk-based security assessment of docker software container images
Alan Mills, Jonathan White and Phil Legg
Computers and Security
As the use of software containerisation has increased, so too has the need for security research on their usage, with various surveys and studies conducted to assess the overall security posture of software container images. To date, there has been very little work that has taken a longitudinal view of container security to observe whether vulnerabilities are being resolved over time, as well as understanding the real-world implications of reported vulnerabilities, to assess the evolving security posture. In this work, we study the evolution of 380 software container images across 3 analysis periods between July 2022 and January 2023 to analyse maintenance and vulnerabilities factors over time. We sample across the 3 DockerHub categories: Official, Verified and OSS (Sponsored) Open Source Software. We found that the number of vulnerabilities present increased over time despite many containers receiving regular updates by providers. We also found that the choice of container OS can dramatically impact the number of reported vulnerabilities present over time, with Debian-based images typically having many more vulnerabilities that other Linux distributions, and with some containers still reporting vulnerabilities that date back as far as 1999. However, when taking into account additional reported attributes such as the attack vector required and the existence of a public exploit rated higher than negligible, we found that for each analysis period, less than 1% of all vulnerabilities present what we would consider as high risk real-world impact. Through our investigation, we aim to improve the understanding of the threat landscape posed by software containerisation that is further complicated by the discrepancies between different vulnerability reporting tools.

Federated Learning: Data Privacy and Cyber Security in Edge-Based Machine Learning
Jonathan White and Phil Legg
Data Protection in a Post-Pandemic Society
Machine learning is now a key component of many applications for understanding trends and characteristics within the wealth of data that may be processed, whether this be learning about customer preferences and travel preferences, forecasting future behaviour of stock markets, weather, or crime rates, classifying and recognising images and text content, or a whole host of other technologies that are becoming integrated as part of our daily lives. The raft of applications is broad and continues to grow daily. At the same time, there are growing concerns about the data protection, security and data privacy of such applications, as smart devices are embedded deeper in our daily activity. How can we ensure that this data that is gathered and utilised about our daily interactions can be best protected, in terms of ensuring systems are truly secure and that user’s privacy is maintained and assured? In this chapter, we explore the recent developments of federated learning, introduced by Google in 2016. This approach mandates that data remains at the place where it was collected, and that is it only data models that pass over the network. In this way, there is no centralised data storage, and no personal data leaves the point where it was generated. We present the recent works of this growing area of research, and we posit the challenges posed from both the data privacy and cyber security standpoints. We show how federated learning can be applied to a cyber security case study of distributed monitoring for Intrusion Detection. We also consider the wider implications of data privacy in machine learning and federated learning systems.

Evaluating Data Distribution Strategies in Federated Learning: A Trade-off Analysis between Privacy and Performance for IoT Security
Jonathan White and Phil Legg
AI Applications in Cyber Security and Communication Networks: Proceedings of Ninth International Conference on Cyber Security, Privacy in Communication Networks (ICCS 2023)
Federated learning is an effective approach for training a global machine learning model. It uses locally acquired data without having to share local data with the centralised server. This method provides a machine learning model beneficial for all parties. It ensures that individual parties do not compromise their privacy or disclose sensitive or personal data. From a cyber security perspective, machine learning with federated learning can highlight intrusions or anomalous activity on a device, without the individual device owner having to reveal characteristics of their own personal usage that would then breach their own privacy. In this paper, we conduct an exploratory investigation into two public datasets, Edge-IIoTset, and CICIoT2023, and we highlight the strengths and limitations of these datasets as currently presented. We then conduct further experimentation on the CICIoT2023 dataset, that previously has only been used for developing centralised learning models. We investigate machine learning performance under various distributions of the data across a set of federated clients, including stratified, leave-one-out, one-class, and half-benign strategies. Specifically, we examine whether a comparable model can be developed using federated learning, and how little data is required by each client to maintain privacy whilst also offering comparable performance against a centralised model.

Honggfuzz+: Fuzzing by Adaptation of Cryptographic Mutation
Sadegh Bamohabbat Chafjiri, Phil Legg, Michail-Antisthenis Tsompanas, Jun Hong
AI Applications in Cyber Security and Communication Networks: Proceedings of Ninth International Conference on Cyber Security, Privacy in Communication Networks (ICCS 2023)
Fuzzing is a widely used technique for software testing, which involves automated operations at different levels (bit, byte, and block) to generate test cases and identify vulnerabilities. This paper explores the adaptation of cryptographic structures, such as Substitution-Permutation Networks (SPN) and Feistel Networks (FN), within the context of memory shuffling. By integrating these structures with the memory swap function of HonggFuzz, we establish a complex relationship between non-overlapping memory regions, leading to the discovery of additional bugs. Real-world targets including Xpdf, libexif, TCPdump, libTIFF, and libxml2 were tested, and the results were compared against state-of-the-art mutators: the baseline of HonggFuzz, LibFuzzer, and AFL++. Our findings demonstrate that our solution, particularly the FN structure, enhances the overall performance of HonggFuzz and uncovers a broader range of unexpected software behaviours on most targets during the verification and validation stages. Additionally, we propose using logarithmic curve fitting to determine the optimal testing duration based on the cumulative number of identified bugs. This approach provides valuable insights into the stability of the fuzzer and offers a more reliable metric for testing time compared to existing techniques in fuzzing.


Defending against Adversarial Machine Learning Attacks using Hierarchical Learning: A case study on Network Traffic Attack Classification
Andrew McCarthy, Essam Ghadafi, Panagiotis Andriotis, Phil Legg
Journal of Information Security and Applications, 2022 [ACCEPTED]
Machine learning is key for automated detection of malicious network activity to ensure that computer networks and organizations are protected against cyber security attacks. Recently, there has been growing interest in the domain of adversarial machine learning, which explores how a machine learning model can be compromised by an adversary, resulting in misclassified output. Whilst to date, most focus has been given to visual domains, the challenge is present in all applications of machine learning where a malicious attacker would want to cause unintended functionality, including cyber security and network traffic analysis. We first present a study on conducting adversarial attacks against a well-trained network traffic classification model. We show how well-crafted adversarial examples can be constructed so that known attack types are misclassified by the model as benign activity. To combat this, we present a novel defensive strategy based on hierarchical learning to help reduce the attack surface that an adversarial example can exploit within the constraints of the parameter space of the intended attack. Our results show that our defensive learning model can withstand crafted adversarial attacks and can achieve classification accuracy in line with our original model when not under attack.

Teaching Offensive and Defensive Cyber Security in Schools using a Raspberry Pi Cyber Range
Phil Legg, Alan Mills, Ian Johnson
Colloquium for Information Systems Security Education (CISSE), 2022 [ACCEPTED]
Computer Science as a subject is now appearing in more school curricula for GCSE and A level, with a growing demand for cyber security to be embedded within this teaching. Yet, teachers face challenges with limited time and resource for preparing practical materials to effectively convey the subject matter. We hosted a series of workshops designed to understand the challenges that teachers face in delivering cyber security education. We then worked with teachers to co-create practical learning resources that could be further developed as tailored lesson plans, as required for their students. In this paper, we report on the challenges highlighted by teachers, and we present a portable and isolated infrastructure for teaching the basics of offensive and defensive cyber security, as a co-created activity based on the teacher workshops. Whilst we present an example case study for red and blue team student engagement, we also reflect on the wide scope of topics and tools that students would be exposed to through this activity, and how this platform could then be generalised for further cyber security teaching.
CISSE Journal

Interactive Cyber-Physical System Hacking: Engaging Students Early Using Scalextric
Jonathan White, Phil Legg, Alan Mills
Colloquium for Information Systems Security Education (CISSE), 2022 [ACCEPTED]
Cyber Security as an education discipline covers a variety of topics that can be challenging and complex for students who are new to the subject domain. With this in mind, it is crucial that new students are motivated by understanding both the technical aspects of computing and networking, and the real-world implications of compromising these systems. In this paper we approach this task to create an engaging outreach experience, on the concept of cyber-physical systems, using a Scalextric slot-car racetrack. In the activity, students seek to compromise the underlying computer system that is linked to the track and updates the scoreboard system, in order to inflate their own score and to sabotage their opponent. Our investigation with this technique shows high levels of engagement whilst providing an excellent platform for teaching basic concepts of enumeration, brute forcing, and privilege escalation. It also provokes discussion on how this activity relates to real-world cases of cyber-physical systems security in the sports domain and beyond.
CISSE Journal

OGMA: Visualisation for Software Container Security Analysis and Automated Remediation
Alan Mills, Jon White, Phil Legg
IEEE Conference on Cyber Security and Resilience, 2022
The use of software containerisation has rapidly increased in academia and industry which has lead to the production of several container security scanning tools for assessing the security posture and threat of a container image. The variability between tools often differ on the coverage of vulnerabilities, their assessed severity and their output formats. It is also common to find duplicate Common Vulnerabilities and Exposures (CVEs) in their reporting which can often skew the risk assessment of a container. These issues along with the lack of automated solutions for maintaining up-to-date patching of container images are currently open issues identified by the research community that we address in this paper. We present OGMA, a visualisation tool for improved analysis and assessment of container security issues across multiple, often conflicting, scanning tools. In addition to severity, our approach helps to examine attack vector and exploit availability, while also removing duplicated CVEs, therefore providing a clearer picture for risk analysts to understand the threat posed by container deployment. Furthermore, we couple this with a novel remediation scheme for updating vulnerable containers whilst ensuring that functionality is preserved, and show how our visualisation system can highlight the improved security posture of the fixed container. Our results highlight the existing security issues in pre-built container images and the inconsistencies between scanning tools, whilst our proposed approach helps to identify and mitigate such threats to improve container security as part of the wider challenges of software supply chain security.
IEEE Xplore

Functionality-Preserving Adversarial Machine Learning for Robust Classification in Cybersecurity and Intrusion Detection Domains: A Survey
Andrew McCarthy, Essam Ghadafi, Panagiotis Andriotis and Phil Legg
Journal of Cybersecurity and Privacy, 2022
Machine learning has become widely adopted as a strategy for dealing with a variety of cybersecurity issues, ranging from insider threat detection to intrusion and malware detection. However, by their very nature, machine learning systems can introduce vulnerabilities to a security defence whereby a learnt model is unaware of so-called adversarial examples that may intentionally result in mis-classification and therefore bypass a system. Adversarial machine learning has been a research topic for over a decade and is now an accepted but open problem. Much of the early research on adversarial examples has addressed issues related to computer vision, yet as machine learning continues to be adopted in other domains, then likewise it is important to assess the potential vulnerabilities that may occur. A key part of transferring to new domains relates to functionality-preservation, such that any crafted attack can still execute the original intended functionality when inspected by a human and/or a machine. In this literature survey, our main objective is to address the domain of adversarial machine learning attacks and examine the robustness of machine learning models in the cybersecurity and intrusion detection domains. We identify the key trends in current work observed in the literature, and explore how these relate to the research challenges that remain open for future works. Inclusion criteria were: articles related to functionality-preservation in adversarial machine learning for cybersecurity or intrusion detection with insight into robust classification. Generally, we excluded works that are not yet peer-reviewed; however, we included some significant papers that make a clear contribution to the domain. There is a risk of subjective bias in the selection of non-peer reviewed articles; however, this was mitigated by co-author review. We selected the following databases with a sizeable computer science element to search and retrieve literature: IEEE Xplore, ACM Digital Library, ScienceDirect, Scopus, SpringerLink, and Google Scholar. The literature search was conducted up to January 2022. We have striven to ensure a comprehensive coverage of the domain to the best of our knowledge. We have performed systematic searches of the literature, noting our search terms and results, and following up on all materials that appear relevant and fit within the topic domains of this review. This research was funded by the Partnership PhD scheme at the University of the West of England in collaboration with Techmodal Ltd.

Investigating Malware Propagation and Behaviour Using System and Network Pixel-Based Visualisation
Jacob Williams and Phil Legg
SN Computer Science, 2022
Malicious software, known as malware, is a perpetual game of cat and mouse between malicious software developers and security professionals. Recent years have seen many high profile cyber attacks, including the WannaCry and NotPetya ransomware attacks that resulted in major financial damages to many businesses and institutions. Understanding the characteristics of such malware, including how malware can propagate and interact between systems and networks is key for mitigating these threats and containing the infection to avoid further damage. In this study, we present visualisation techniques for understanding the propagation characteristics in dynamic malware analysis. We propose the use of pixel-based visualisations to convey large-scale complex information about network hosts in a scalable and informative manner. We demonstrate our approach using a virtualised network environment, whereby we can deploy malware variants and observe their propagation behaviours. As a novel form of visualising system and network activity data across a complex environment, we can begin to understand visual signatures that can help analysts identify key characteristics of the malicious behaviours, and, therefore, provoke response and mitigation against such attacks.


Deep Learning-Based Security Behaviour Analysis in IoT Environments: A Survey
Yawei Yue, Shancang Li, Phil Legg and Fuzhong Li
Security and Communication Networks, 2021
Internet of Things (IoT) applications have been used in a wide variety of domains ranging from smart home, healthcare, smart energy, and Industrial 4.0. While IoT brings a number of benefits including convenience and efficiency, it also introduces a number of emerging threats. The number of IoT devices that may be connected, along with the ad hoc nature of such systems, often exacerbates the situation. Security and privacy have emerged as significant challenges for managing IoT. Recent work has demonstrated that deep learning algorithms are very efficient for conducting security analysis of IoT systems and have many advantages compared with the other methods. This paper aims to provide a thorough survey related to deep learning applications in IoT for security and privacy concerns. Our primary focus is on deep learning enhanced IoT security. First, from the view of system architecture and the methodologies used, we investigate applications of deep learning in IoT security. Second, from the security perspective of IoT systems, we analyse the suitability of deep learning to improve security. Finally, we evaluate the performance of deep learning in IoT system security.
Hindawi Security and Communication Networks

Unsupervised One-Class Learning for Anomaly Detection on Home IoT Network Devices
Jonathan White and Phil Legg
International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2021
In this paper we study anomaly detection methods for home IoT devices. Specifically, we address unsupervised one-class learning methods due to their ability to learn deviations from a single normal class. In a home IoT environment, this consideration is crucial as supervised methods would result in a burden on many non-technical consumers which could hinder their effectiveness. For our study, we develop a home IoT network monitoring tool, and we illustrate network attacks against a variety of typical home IoT devices. As a result, we propose measures that could aid home consumers in defending ever-increasing home IoT networks.
IEEE Xplore

Feature Vulnerability and Robustness Assessment against Adversarial Machine Learning Attacks
Andrew McCarthy, Panagiotis Andriotis, Essam Ghadafi and Phil Legg
International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2021
Whilst machine learning has been widely adopted for various domains, it is important to consider how such techniques may be susceptible to malicious users through adversarial attacks. Given a trained classifier, a malicious attack may attempt to craft a data observation whereby the data features purposefully trigger the classifier to yield incorrect responses. This has been observed in various image classification tasks, including falsifying road sign detection and facial recognition, which could have severe consequences in real-world deployment. In this work, we investigate how these attacks could impact on network traffic analysis, and how a system could perform misclassification of common network attacks such as DDoS attacks. Using the CICIDS2017 data, we examine how vulnerable the data features used for intrusion detection are to perturbation attacks using FGSM adversarial examples. As a result, our method provides a defensive approach for assessing feature robustness that seeks to balance between classification accuracy whilst minimising the attack surface of the feature space.
IEEE Xplore

'Hacking an IoT Home': New opportunities for cyber security education combining remote learning with cyber-physical systems
Phil Legg, Thomas Higgs, Pennie Spruhan, Jonathan White and Ian Johnson
International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2021
In March 2020, the COVID-19 pandemic led to a dramatic shift in educational practice, whereby home-schooling and remote working became the norm. Many typical schools outreach projects to encourage uptake of learning cyber security skills therefore were put on hold, due to the inability to physical attend and inspire. In this short paper, we describe a new approach to teaching cyber security with a view of inspiring a new generation of learners to the subject. Traditional Capture-The-Flag exercises are widely used in cyber security education, whereby a series of challenges are completed to gain access and obtain a passphrase from a computer system. We couple this approach with interactive sessions made possible via video conferencing platforms such as Microsoft Teams and Zoom, along with the very nature of being in the home environment, where home IoT devices are now commonplace. We develop an integrated CTF for the home IoT environment, where students can observe the impact of submitting flags via online video, to physical adjust the home environment - ranging from switching off lights, playing music, or controlling an IoT-enabled robot. The result is a highly interactive and engaging experience that benefits from the very nature of remote working, inspiring the notion of "hacking an IoT home".
IEEE Xplore


Investigating Anti-Evasion Malware Triggers Using Automated Sandbox Reconfiguration Techniques
Alan Mills and Phil Legg
Journal of Cybersecurity and Privacy, 2020
Malware analysis is fundamental for defending against prevalent cyber security threats and requires a means to deploy and study behavioural software traits as more sophisticated malware is developed. Traditionally, virtual machines are used to provide an environment that is isolated from production systems so as to not cause any adverse impact on existing infrastructure. Malware developers are fully aware of this and so will often develop evasion techniques to avoid detection within sandbox environments. In this paper, we conduct an investigation of anti-evasion malware triggers for uncovering malware that may attempt to conceal itself when deployed in a traditional sandbox environment. To facilitate our investigation, we developed a tool called MORRIGU that couples together both automated and human-driven analysis for systematic testing of anti-evasion methods using dynamic sandbox reconfiguration techniques. This is further supported by visualisation methods for performing comparative analysis of system activity when malware is deployed under different sandbox configurations. Our study reveals a variety of anti-evasion traits that are shared amongst different malware families, such as sandbox “wear-and-tear”, and Reverse Turing Tests (RTT), as well as more sophisticated malware samples that require multiple anti-evasion checks to be deployed. We also perform a comparative study using Cuckoo sandbox to demonstrate the limitations of adopting only automated analysis tools, to justify the exploratory analysis provided by MORRIGU. By adopting a clearer systematic process for uncovering anti-evasion malware triggers, as supported by tools like MORRIGU, this study helps to further the research of evasive malware analysis so that we can better defend against such future attacks.

The Visual Design of Network Data to Enhance Cyber Security Awareness of the Everyday Internet User
Fiona Carroll, Phil Legg and Bastian Bønkel
International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2020
Technology and the use of online services are very prevalent across much of our everyday lives. As our digital interactions continue to grow, there is a need to improve public awareness of the risks to our personal online privacy and security. Designing for cyber security awareness has never been so important. In this work, we consider people's current impressions towards their privacy and security online. We also explore how abnormal network activity data can be visually conveyed to afford a heightened cyber security awareness. In detail, the paper documents the different effects of visual variables in an edge and node DoS visualisation to depict abnormally high volumes of traffic. The results from two studies show that people are generally becoming more concerned about their privacy and security online. Moreover, we have found that the more focus based visual techniques (i.e. blur) and geometry-based techniques (i.e. jaggedness and sketchiness) afford stronger impressions of uncertainty from abnormally high volumes of network traffic. In terms of security, these impressions and feelings alert in the end-user that something is not quite as it should be and hence develop a heightened cyber security awareness.
IEEE Xplore

Shouting Through Letterboxes: A study on attack susceptibility of voice assistants
Andrew McCarthy, Benedict R. Gaster and Phil Legg
International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2020
Voice assistants such as Amazon Echo and Google Home have become increasingly popular for many home users, for home automation, entertainment, and convenience. These devices process speech commands from a user to execute some action, such as playing music, making online purchases, or triggering home automation such as lights or security locks. The process of mapping speech input to a text command is performed using a machine learning model. In this study, we explore the concept of how voice assistants could be exploited, where genuine audio commands are manipulated such that an attacker could trigger alternative responses from the voice assistant. We present a small-scale study to examine mis-interpretations made by voice assistants. We also study user perception of how secure their voice devices are, and their approach to security and privacy.
IEEE Xplore

“What did you say?”: Extracting unintentional secrets from predictive text learning systems
Gwyn Wilkinson and Phil Legg
International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 2020
As a primary form of communication, text is used widely for online communications, including e-mail conversations, mobile text messaging, chatroom and forum discussions. Modern systems include facilities such as predictive text, recently implemented using deep learning algorithms, to estimate the next word to be written based on previous historical entries. However, we often enter sensitive information such as passwords using the same input devices - namely, smartphone soft keyboards. In this paper, we explore the problem of deep learning models which memorise sensitive training data, and how secrets can be extracted from predictive text models. We propose a general black-box attack algorithm to accomplish this for all kinds of memorised sequences, discuss mitigations and countermeasures, and explore how this attack vector could be deployed on an Android or iOS mobile device platforms as part of target reconnaissance.
IEEE Xplore


Visual analytics for collaborative human-machine confidence in human-centric active learning tasks
Phil Legg, Jim Smith and Alexander Downing
Human-centric Computing and Information Sciences, 2019
Active machine learning is a human-centric paradigm that leverages a small labelled dataset to build an initial weak classifier, that can then be improved over time through human-machine collaboration. As new unlabelled samples are observed, the machine can either provide a prediction, or query a human ‘oracle’ when the machine is not confident in its prediction. Of course, just as the machine may lack confidence, the same can also be true of a human ‘oracle’: humans are not all-knowing, untiring oracles. A human’s ability to provide an accurate and confident response will often vary between queries, according to the duration of the current interaction, their level of engagement with the system, and the difficulty of the labelling task. This poses an important question of how uncertainty can be expressed and accounted for in a human-machine collaboration. In short, how can we facilitate a mutually-transparent collaboration between two uncertain actors—a person and a machine—that leads to an improved outcome? In this work, we demonstrate the benefit of human-machine collaboration within the process of active learning, where limited data samples are available or where labelling costs are high. To achieve this, we developed a visual analytics tool for active learning that promotes transparency, inspection, understanding and trust, of the learning process through human-machine collaboration. Fundamental to the notion of confidence, both parties can report their level of confidence during active learning tasks using the tool, such that this can be used to inform learning. Human confidence of labels can be accounted for by the machine, the machine can query for samples based on confidence measures, and the machine can report confidence of current predictions to the human, to further the trust and transparency between the collaborative parties. In particular, we find that this can improve the robustness of the classifier when incorrect sample labels are provided, due to unconfidence or fatigue. Reported confidences can also better inform human-machine sample selection in collaborative sampling. Our experimentation compares the impact of different selection strategies for acquiring samples: machine-driven, human-driven, and collaborative selection. We demonstrate how a collaborative approach can improve trust in the model robustness, achieving high accuracy and low user correction, with only limited data sample selections.

Tools and techniques for improving cyber situational awareness of targeted phishing attacks
Phil Legg and Tim Blackman
International conference on Cyber Situational Awareness, Data Analytics and Assessment (Cyber SA), 2019
Phishing attacks continue to be one of the most common attack vectors used online today to deceive users, such that attackers can obtain unauthorised access or steal sensitive information. Phishing campaigns often vary in their level of sophistication, from mass distribution of generic content, such as delivery notifications, online purchase orders, and claims of winning the lottery, through to bespoke and highly-personalised messages that convincingly impersonate genuine communications (e.g., spearphishing attacks). There is a distinct trade-off here between the scale of an attack versus the effort required to curate content that is likely to convince an individual to carry out an action (typically, clicking a malicious hyperlink). In this short paper, we conduct a preliminary study on a recent realworld incident that strikes a balance between attacking at scale and personalised content. We adopt different visualisation tools and techniques for better assessing the scale and impact of the attack, that can be used both by security professionals to analyse the security incident, but could also be used to inform employees as a form of security awareness and training. We pitched the approach to IT professionals working in information security, who believe this may provide improved awareness of how targeted phishing campaigns can impact an organisation, and could contribute towards a pro-active step of how analysts will examine and mitigate the impact of future attacks across the organisation.
IEEE Xplore

Efficient and interpretable real-time malware detection using random-forest
Alan Mills, Theodoros Spyridopoulos and Phil Legg
International conference on Cyber Situational Awareness, Data Analytics and Assessment (Cyber SA), 2019
Malicious software, often described as malware, is one of the greatest threats to modern computer systems, and attackers continue to develop more sophisticated methods to access and compromise data and resources. Machine learning methods have potential to improve malware detection both in terms of accuracy and detection runtime, and is an active area within academic research and commercial development. Whilst the majority of research focused on improving accuracy and runtime of these systems, to date there has been little focus on the interpretability of detection results. In this paper, we propose a lightweight malware detection system called NODENS that can be deployed on affordable hardware such as a Raspberry Pi. Crucially, NODENS provides transparency of output results so that an end-user can begin to examine why the classifier believes a software sample to be either malicious or benign. Using an efficient Random-Forest approach, our system provides interpretability whilst not sacrificing accuracy or detection runtime, with an average detection speed of between 3-8 seconds, allowing for early remedial action to be taken before damage is caused.
IEEE Xplore

What Makes for Effective Visualisation in Cyber Situational Awareness for Non-Expert Users?
Fiona Carroll, Adam Chakof and Phil Legg
International conference on Cyber Situational Awareness, Data Analytics and Assessment (Cyber SA), 2019
As cyber threats continue to become more prevalent, there is a need to consider how best we can understand the cyber landscape when acting online, especially so for non-expert users. Satellite navigation systems provide the de facto standard for many modern day navigation tasks in the physical domain, so we consider the question of how one could navigate the online domain using similar concepts. In this paper, we study the design of a cyber sat nav for improving situational awareness of nonexpert users. We focus on three core tasks: understanding where we are in cyber space, understanding how we got there, and understanding future states that we may traverse to. To support understanding, we explore the use of visualisation techniques to portray complex online activities in clear and engaging formats for non-expert users.
IEEE Xplore

Venue2Vec: An Efficient Embedding Model for Fine-Grained User Location Prediction in Geo-Social Networks
Shuai Xu, Jiuxin Cao, Phil Legg, Bo Liu and Shancang Li
IEEE Systems Journal, 2019
Geo-Social Networks (GSN) significantly improve location-aware capability of services by offering geo-located content based on the huge volumes of data generated in the GSN. The problem of user location prediction based on user-generated data in GSN has been extensively studied. However, existing studies are either concerning predicting users’ next check-in location or predicting their future check-in location at a given time with coarse granularity. A unified model that can predict both scenarios with fine granularity is quite rare. Also, due to the heterogeneity of multiple factors associated with both locations and users, how to efficiently incorporate these information still remains challenging. Inspired by the recent success of word embedding in natural language processing, in this paper, we propose a novel embedding model called Venue2Vec which automatically incorporates temporal-spatial context, semantic information, and sequential relations for fine-grained user location prediction. Locations of the same type, and those that are geographically close or often visited successively by users will be situated closer within the embedding space. Based on our proposed Venue2Vec model, we design techniques that allow for predicting a user’s next check-in location, and also their future check-in location at a given time. We conduct experiments on three real-world GSN datasets to verify the performance of the proposed model. Experimental results on both tasks show that Venue2Vec model outperforms several state-of-the-art models on various evaluation metrics. Furthermore, we show how the Venue2Vec model can be more time-efficient due to being parallelizable.
IEEE Xplore


Visualising state space representations of LSTM networks
Emmanuel M Smith, Jim Smith, Phil Legg, Simon Francis
Workshop on Visualization for AI Explainability (VISxAI), 2018
Long Short-Term Memory (LSTM) networks have proven to be one of the most effective models for making predictions on sequence-based tasks. These models work by capturing, remembering, and forgetting information relevant to their future predictions. The non-linear complexity of the mechanisms involved in this process means we currently lack tools for achieving interpretability. Ideally, we want these models to provide an explanation of why they make a particular prediction, given a specific input. Researchers have explored the idea of interpreting LSTMs in specific contexts such as natural language processing or classification, but they put minimal focus on approaches which are generalisable across different applications. To alleviate this, in this work, we demonstrate a method which enables the interpretation and comparison of LSTM states during time series predictions. We show that by reducing the dimensionality of network states one can scalably visualise patterns and explain model behaviours.

Predicting User Confidence During Visual Decision Making
Jim Smith, Phil Legg, Milos Matovic and Kristofer Kinsey
ACM Transactions on Interactive Intelligent Systems, 2018
People are not infallible consistent “oracles”: their confidence in decision-making may vary significantly between tasks and over time. We have previously reported the benefits of using an interface and algorithms that explicitly captured and exploited users’ confidence: error rates were reduced by up to 50% for an industrial multi-class learning problem; and the number of interactions required in a design-optimisation context was reduced by 33%. Having access to users’ confidence judgements could significantly benefit intelligent interactive systems in industry, in areas such as intelligent tutoring systems and in health care. There are many reasons for wanting to capture information about confidence implicitly. Some are ergonomic, but others are more “social”—such as wishing to understand (and possibly take account of) users’ cognitive state without interrupting them. We investigate the hypothesis that users’ confidence can be accurately predicted from measurements of their behaviour. Eye-tracking systems were used to capture users’ gaze patterns as they undertook a series of visual decision tasks, after each of which they reported their confidence on a 5-point Likert scale. Subsequently, predictive models were built using “conventional” machine learning approaches for numerical summary features derived from users’ behaviour. We also investigate the extent to which the deep learning paradigm can reduce the need to design features specific to each application by creating “gaze maps”—visual representations of the trajectories and durations of users’ gaze fixations—and then training deep convolutional networks on these images. Treating the prediction of user confidence as a two-class problem (confident/not confident), we attained classification accuracy of 88% for the scenario of new users on known tasks, and 87% for known users on new tasks. Considering the confidence as an ordinal variable, we produced regression models with a mean absolute error of ≈0.7 in both cases. Capturing just a simple subset of non-task-specific numerical features gave slightly worse, but still quite high accuracy (e.g., MAE ≈ 1.0). Results obtained with gaze maps and convolutional networks are competitive, despite not having access to longer-term information about users and tasks, which was vital for the “summary” feature sets. This suggests that the gaze-map-based approach forms a viable, transferable alternative to handcrafting features for each different application. These results provide significant evidence to confirm our hypothesis, and offer a way of substantially improving many interactive artificial intelligence applications via the addition of cheap non-intrusive hardware and computationally cheap prediction algorithms.
ACM Digital Library


Predicting the occurrence of world news events using recurrent neural networks and auto-regressive moving average models
Emmanuel M Smith, Jim Smith, Phil Legg, Simon Francis
Advances in Computational Intelligence Systems (17th UK Workshop on Computational Intelligence), 2017
The ability to predict future states is fundamental for a wide variety of applications, from weather forecasting to stock market analysis. Understanding the related data attributes that can influence changes in time series is a challenging task that is critical for making accurate predictions. One particular application of key interest is understanding the factors that relate to the occurrence of global activities from online world news reports. Being able to understand why particular types of events may occur, such as violence and peace, could play a vital role in better protecting and understanding our global society. In this work, we explore the concept of predicting the occurrence of world news events, making use of Global Database of Events, Language and Tone online news aggregation source. We compare traditional Auto-Regressive Moving Average models with more recent deep learning strategies using Long Short-Term Memory Recurrent Neural Networks. Our results show that the latter are capable of achieving lower error rates. We also discuss how deep learning methods such as Recurrent Neural Networks have the potential for greater capability to incorporate complex associations of data attributes that may impact the occurrence of future events.

RicherPicture: Semi-automated cyber defence using context-aware data analytics
Arnau Erola, Ioannis Agrafiotis, Jassim Happa, Michael Goldsmith, Sadie Creese and Philip Legg
International Conference On Cyber Situational Awareness, Data Analytics And Assessment (Cyber SA), 2017
In a continually evolving cyber-threat landscape, the detection and prevention of cyber attacks has become a complex task. Technological developments have led organisations to digitise the majority of their operations. This practice, however, has its perils, since cybespace offers a new attack-surface. Institutions which are tasked to protect organisations from these threats utilise mainly network data and their incident response strategy remains oblivious to the needs of the organisation when it comes to protecting operational aspects. This paper presents a system able to combine threat intelligence data, attack-trend data and organisational data (along with other data sources available) in order to achieve automated network-defence actions. Our approach combines machine learning, visual analytics and information from business processes to guide through a decision-making process for a Security Operation Centre environment. We test our system on two synthetic scenarios and show that correlating network data with non-network data for automated network defences is possible and worth investigating further.
IEEE Xplore

Human-Machine Decision Support Systems for Insider Threat Detection
Philip A. Legg
Data Analytics and Decision Support for Cybersecurity, 2017
Insider threats are recognised to be quite possibly the most damaging attacks that an organisation could experience. Those on the inside, who have privileged access and knowledge, are already in a position of great responsibility for contributing towards the security and operations of the organisation. Should an individual choose to exploit this privilege, perhaps due to disgruntlement or external coercion from a competitor, then the potential impact to the organisation can be extremely damaging. There are many proposals of using machine learning and anomaly detection techniques as a means of automated decision-making about which insiders are acting in a suspicious or malicious manner, as a form of large scale data analytics. However, it is well recognised that this poses many challenges, for example, how do we capture an accurate representation of normality to assess insiders against, within a dynamic and ever-changing organisation? More recently, there has been interest in how visual analytics can be incorporated with machine-based approaches, to alleviate the data analytics challenges of anomaly detection and to support human reasoning through visual interactive interfaces. Furthermore, by combining visual analytics and active machine learning, there is potential capability for the analysts to impart their domain expert knowledge back to the system, so as to iteratively improve the machine-based decisions based on the human analyst preferences. With this combined human-machine approach to decision-making about potential threats, the system can begin to more accurately capture human rationale for the decision process, and reduce the false positives that are flagged by the system. In this work, I reflect on the challenges of insider threat detection, and look to how human-machine decision support systems can offer solutions towards this.


Visual Analytics for Non-Expert Users in Cyber Situation Awareness
Philip A. Legg
International Journal On Cyber Situational Awareness (IJCSA), 2016
Situation awareness is often described as the perception and comprehension of the current situation, and the projection of future status. Whilst this may be well understood in an organisational cybersecurity context, there is a strong case to be made for effective cybersecurity situation awareness that is tailored to the needs of the Non-Expert User (NEU). Our online usage habits are rapidly evolving with smartphones and tablets being widely used to access resources online. In order for NEUs to remain safe online, there is a need to enhance awareness and understanding of cybersecurity concerns, such as how devices may be acting online, and what data is being shared between devices. In this paper, we extend our proposal of the Enhanced Personal Situation Awareness (ePSA) framework to consider the key details of cyber situation awareness that would be of concern to NEUs, and we consider how such information can be effectively conveyed using a visual analytic approach. We present the design of our visual analytics approach to show how this can represent the key details of cyber situation awareness whilst maintaining a simple and clean design scheme so as to not result in information-overload for the user. The guidance developed through the course of this work can help practitioners develop tools that could help NEUs better understand their online actions, with the aim of giving users greater control and safer experiences when their personal devices are acting online.

Glyph Visualization - A fail-safe design scheme based on quasi-Hamming distances
Philip A. Legg, Eamonn Maguire, Simon Walton and Min Chen
IEEE Computer Graphics and Applications, 2016
In many spatial and temporal visualization applications, glyphs provide an effective means for encoding multivariate data. However, because glyphs are typically small, they are vulnerable to various perceptual errors. This article introduces the concept of a quasi-Hamming distance in the context of glyph design and examines the feasibility of estimating the quasi-Hamming distance between a pair of glyphs and the minimal Hamming distance for a glyph set. The authors demonstrate the design concept by developing a file-system event visualization that can depict the activities of multiple users.
IEEE Xplore

Automated Registration of Multimodal Optic Disc Images: Clinical Assessment of Alignment Accuracy
Wai Siene Ng, Phil Legg, Venkat Avadhanam, Kyaw Aye, Steffan H. P. Evans, Rachel V. North, Andrew D. Marshall, Paul Rosin and James E. Morgan
Journal of Glaucoma, 2016
Purpose: To determine the accuracy of automated alignment algorithms for the registration of optic disc images obtained by 2 different modalities: fundus photography and scanning laser tomography. Materials and Methods: Images obtained with the Heidelberg Retina Tomograph II and paired photographic optic disc images of 135 eyes were analyzed. Three state-of-the-art automated registration techniques Regional Mutual Information, rigid Feature Neighbourhood Mutual Information (FNMI), and nonrigid FNMI (NRFNMI) were used to align these image pairs. Alignment of each composite picture was assessed on a 5-point grading scale: “Fail” (no alignment of vessels with no vessel contact), “Weak” (vessels have slight contact), “Good” (vessels with <50% contact), “Very Good” (vessels with >50% contact), and “Excellent” (complete alignment). Custom software generated an image mosaic in which the modalities were interleaved as a series of alternate 5×5-pixel blocks. These were graded independently by 3 clinically experienced observers. Results: A total of 810 image pairs were assessed. All 3 registration techniques achieved a score of “Good” or better in >95% of the image sets. NRFNMI had the highest percentage of “Excellent” (mean: 99.6%; range, 95.2% to 99.6%), followed by Regional Mutual Information (mean: 81.6%; range, 86.3% to 78.5%) and FNMI (mean: 73.1%; range, 85.2% to 54.4%). Conclusions: Automated registration of optic disc images by different modalities is a feasible option for clinical application. All 3 methods provided useful levels of alignment, but the NRFNMI technique consistently outperformed the others and is recommended as a practical approach to the automated registration of multimodal disc images.
Journal of Glaucoma

Enhancing cyber situation awareness for Non-Expert Users using visual analytics
Philip A. Legg
International Conference On Cyber Situational Awareness, Data Analytics And Assessment (CyberSA), 2016
Situation awareness is often described as the perception and comprehension of the current situation, and the projection of future status. Whilst this may be understood in an organisational cybersecurity context, there is a strong case to be made for effective cybersecurity situation awareness that is tailored to the needs of the Non-Expert User (NEU). Our online usage habits are rapidly evolving with smartphones and tablets being widely used to access resources online. In order for NEUs to remain safe online, there is a need to enhance awareness and understanding of cybersecurity concerns, such as how devices may be acting online, and what data is being shared between devices. In this paper, we explore the notion of personal situation awareness for NEUs. We conduct a small-scale study to understand how NEUs perceive cybersecurity. We also propose how visual analytics could be used to help encourage NEUs to actively monitor and observe their activity for greater online awareness. The guidance developed through the course of this work can help practitioners develop tools that could help NEUs better understand their online actions, with the aim to result in safer experiences when acting online.
IEEE Xplore


Visualizing the insider threat: challenges and tools for identifying malicious user activity
Philip A. Legg
IEEE Symposium on Visualization for Cyber Security (VizSec), 2015
One of the greatest challenges for managing organisational cyber security is the threat that comes from those who operate within the organisation. With entitled access and knowledge of organisational processes, insiders who choose to attack have the potential to cause serious impact, such as financial loss, reputational damage, and in severe cases, could even threaten the existence of the organisation. Security analysts therefore require sophisticated tools that allow them to explore and identify user activity that could be indicative of an imminent threat to the organisation. In this work, we discuss the challenges associated with identifying insider threat activity, along with the tools that can help to combat this problem. We present a visual analytics approach that incorporates multiple views, including a user selection tool that indicates anomalous behaviour, an interactive Principal Component Analysis (iPCA) tool that aids the analyst to assess the reasoning behind the anomaly detection results, and an activity plot that visualizes user and role activity over time. We demonstrate our approach using the Carnegie Mellon University CERT Insider Threat Dataset to show how the visual analytics workflow supports the Information-Seeking mantra.
IEEE Xplore

Identifying attack patterns for insider threat detection
Ioannis Agrafiotis, Jason R.C. Nurse, Oliver Buckley, Phil Legg, Sadie Creese and Michael Goldsmith
Computer Fraud & Security, 2015
The threat that insiders pose to businesses, institutions and governmental organisations continues to be of serious concern. Recent industry surveys provide unequivocal evidence to support the significance of this threat and its prevalence in enterprises today.1 In an attempt to address this challenge, several approaches and systems have been proposed by practitioners and researchers. These focus on defining the insider threat and exploring the human and psychological factors involved, through to the detection and deterrence of these threats via technological and behavioural theories. Insider threats pose major concerns to businesses, institutions and governmental organisations. Few solutions to this problem consider all the technical, organisational and behavioural aspects. In new research, Ioannis Agrafiotis, Jason RC Nurse, Oliver Buckley, Phil Legg, Sadie Creese and Michael Goldsmith define attack patterns that could be key in assisting insider-threat detection, based on 120 real-world case studies. They present their findings, representing each case study as a series of attack steps and identify common trends between different attacks.

Feature Neighbourhood Mutual Information for multi-modal image registration: An application to eye fundus imaging
Philip A. Legg, Paul L. Rosin, David Marshall and James E. Morgan
Pattern Recognition, 2015
Multi-modal image registration is becoming an increasingly powerful tool for medical diagnosis and treatment. The combination of different image modalities facilitates much greater understanding of the underlying condition, resulting in improved patient care. Mutual Information is a popular image similarity measure for performing multi-modal image registration. However, it is recognised that there are limitations with the technique that can compromise the accuracy of the registration, such as the lack of spatial information that is accounted for by the similarity measure. In this paper, we present a two-stage non-rigid registration process using a novel similarity measure, Feature Neighbourhood Mutual Information. The similarity measure efficiently incorporates both spatial and structural image properties that are not traditionally considered by MI. By incorporating such features, we find that this method is capable of achieving much greater registration accuracy when compared to existing methods, whilst also achieving efficient computational runtime. To demonstrate our method, we use a challenging medical image data set consisting of paired retinal fundus photographs and confocal scanning laser ophthalmoscope images. Accurate registration of these image pairs facilitates improved clinical diagnosis, and can be used for the early detection and prevention of glaucoma disease.

Automated Insider Threat Detection System Using User and Role-Based Profile Assessment
Philip A. Legg, Oliver Buckley, Michael Goldsmith and Sadie Creese
IEEE Systems Journal, 2015
Organizations are experiencing an ever-growing concern of how to identify and defend against insider threats. Those who have authorized access to sensitive organizational data are placed in a position of power that could well be abused and could cause significant damage to an organization. This could range from financial theft and intellectual property theft to the destruction of property and business reputation. Traditional intrusion detection systems are neither designed nor capable of identifying those who act maliciously within an organization. In this paper, we describe an automated system that is capable of detecting insider threats within an organization. We define a tree-structure profiling approach that incorporates the details of activities conducted by each user and each job role and then use this to obtain a consistent representation of features that provide a rich description of the user's behavior. Deviation can be assessed based on the amount of variance that each user exhibits across multiple attributes, compared against their peers. We have performed experimentation using ten synthetic data-driven scenarios and found that the system can identify anomalous behavior that may be indicative of a potential threat. We also show how our detection system can be combined with visual analytics tools to support further investigation by an analyst.
IEEE Xplore

Knowledge-Assisted Ranking: A Visual Analytic Application for Sports Event Data
David Chung, Philip Legg, Matthew Parry, Rhodri Bown, Iwan Griffiths, Robert Laramee, Min Chen
IEEE Computer Graphics and Applications, 2015
Organizing sports video data for performance analysis can be challenging, especially in cases involving multiple attributes and when the criteria for sorting frequently changes depending on the user's task. The proposed visual analytic system enables users to specify a sort requirement in a flexible manner without depending on specific knowledge about individual sort keys. The authors use regression techniques to train different analytical models for different types of sorting requirements and use visualization to facilitate knowledge discovery at different stages of the process. They demonstrate the system with a rugby case study to find key instances for analyzing team and player performance. Organizing sports video data for performance analysis can be challenging in cases with multiple attributes, and when sorting frequently changes depending on the user's task. As this video shows, the proposed visual analytic system allows interactive data sorting and exploration.
IEEE Xplore

Quasi-Hamming distances: An overarching concept for measuring glyph similarity
Philip A Legg, Eamonn Maguire, Simon Walton, Min Chen
Eurographics UK: Computer Graphics and Visual Computing, 2015
In many applications of spatial or temporal visualization, glyphs provide an effective means for encoding multivariate data objects. However, because glyphs are typically small, they are vulnerable to various perceptual errors. In data communication, the concept of Hamming distance underpins the study of codes that support error detection and correction by the receiver without the need for corroboration from the sender. In this extended abstract, we outline a novel concept of quasi-Hamming distance in the context of glyph design. We discuss the feasibility of estimating quasi-Hamming distance between a pair of glyphs, and the minimal Hamming distance for a glyph set. This measurement enables glyph designers to determine the differentiability between glyphs, facilitating design optimization by maximizing distances between glyphs under various constraints (eg, the available number of visual channels and their encoding bandwidth).
UWE Library

Caught in the Act of an Insider Attack: Detection and Assessment of Insider Threat
Philip A. Legg, Oliver Buckley, Michael Goldsmith, Sadie Creese
IEEE Symposium on Technologies for Homeland Security, 2015
The greatest asset that any organisation has are its people, but they may also be the greatest threat. Those who are within the organisation may have authorised access to vast amounts of sensitive company records that are essential for maintaining competitiveness and market position, and knowledge of information services and procedures that are crucial for daily operations. In many cases, those who have such access do indeed require it in order to conduct their expected workload. However, should an individual choose to act against the organisation, then with their privileged access and their extensive knowledge, they are well positioned to cause serious damage. Insider threat is becoming a serious and increasing concern for many organisations, with those who have fallen victim to such attacks suffering significant damages including financial and reputational. It is clear then, that there is a desperate need for more effective tools for detecting the presence of insider threats and analyzing the potential of threats before they escalate. We propose Corporate Insider Threat Detection (CITD), an anomaly detection system that is the result of a multi-disciplinary research project that incorporates technical and behavioural activities to assess the threat posed by individuals. The system identifies user and role-based profiles, and measures how users deviate from their observed behaviours to assess the potential threat that a series of activities may pose. In this paper, we present an overview of the system and describe the concept of operations and practicalities of deploying the system. We show how the system can be utilised for unsupervised detection, and also how the human analyst can engage to provide an active learning feedback loop. By adopting an accept or reject scheme, the analyst is capable of refining the underlying detection model to better support their decisionmaking process and significant reduce the false positive rate.
IEEE Xplore

Using Internet Activity Profiling for Insider-Threat Detection
Bushra A. Alahmadi, Philip A. Legg, Jason R.C. Nurse
Workshop on Security in Information Systems (WOSIS), 2015
The insider-threat problem continues to be a major risk to both public and private sectors, where those people who have privileged knowledge and access choose to abuse this in some way to cause harm towards their organisation. To combat against this, organisations are beginning to invest heavily in deterrence monitoring tools to observe employees’ activity, such as computer access, Internet browsing, and email communications. Whilst such tools may provide some way towards detecting attacks afterwards, what may be more useful is preventative monitoring, where user characteristics and behaviours inform about the possibility of an attack before it happens. Psychological research advocates that the behaviour and preference of a person can be explained to a great extent by psychological constructs called personality traits, which could then possibly indicate the likelihood of an individual being a potential insider threat. By considering how browsing content relates to psychological constructs (such as OCEAN), and how an individual’s browsing behaviour deviates over time, potential insider-threats could be uncovered before significant damage is caused. The main contribution in this paper is to explore how Internet browsing activity could be used to predict the individual’s psychological characteristics in order to detect potential insider-threats. Our results demonstrate that predictive assessment can be made between the content available on a website, and the associated personality traits, which could greatly improve the prospects of preventing insider attacks.
Oxford Research

Glyph sorting: Interactive visualization for multi-dimensional data
David HS Chung, Philip A Legg, Matthew L Parry, Rhodri Bown, Iwan W Griffiths, Robert S Laramee, Min Chen
Information Visualization, 2015
Glyph-based visualization is an effective tool for depicting multivariate information. Since sorting is one of the most common analytical tasks performed on individual attributes of a multi-dimensional dataset, this motivates the hypothesis that introducing glyph sorting would significantly enhance the usability of glyph-based visualization. In this article, we present a glyph-based conceptual framework as part of a visualization process for interactive sorting of multivariate data. We examine several technical aspects of glyph sorting and provide design principles for developing effective, visually sortable glyphs. Glyphs that are visually sortable provide two key benefits: (1) performing comparative analysis of multiple attributes between glyphs and (2) to support multi-dimensional visual search. We describe a system that incorporates focus and context glyphs to control sorting in a visually intuitive manner and for viewing sorted results in an interactive, multi-dimensional glyph plot that enables users to perform high-dimensional sorting, analyse and examine data trends in detail. To demonstrate the usability of glyph sorting, we present a case study in rugby event analysis for comparing and analysing trends within matches. This work is undertaken in conjunction with a national rugby team. From using glyph sorting, analysts have reported the discovery of new insight beyond traditional match analysis.


Towards a User and Role-based Sequential Behavioural Analysis Tool for Insider Threat Detection
Ioannis Agrafiotis, Philip A Legg, Michael Goldsmith, Sadie Creese
Journal of Internet Services and Information Security, 2014
Insider threat is recognised to be a significant problem and of great concern to both corporations and governments alike. Traditional intrusion detection systems are known to be ineffective due to the extensive knowledge and capability that insiders typically have regarding the organisational setup. Instead, more sophisticated measures are required to analyse the actions performed by those within the organisation, to assess whether their actions suggest that they pose a threat. In this paper, we propose a proof-of-concept that focuses on the use of activity trees to establish sequential-based analysis of employee behaviour. This concept combines the notions of previously-proposed techniques such as attack trees and behaviour trees. For a given employee, we define a tree that can represent all sequences of their observed behaviours. Over time, branches are either appended or created to reflect the new observations that are made on how the employee acts. We also incorporate a similarity measure to establish how different branches compare against each other. Attacks can be defined as where the similarity measure between a newly-observed branch and all existing branches is below a given acceptance criteria. The approach would allow an analyst to observe chains of events that result in low probability activities that could be deemed as unusual and therefore may be malicious. We demonstrate our proof-of-concept using third-party synthetic employee activity logs, to illustrate the practicalities of delivering this form of protective monitoring.

Visual analytics of e-mail sociolinguistics for user behavioural analysis
Philip A Legg, Oliver Buckley, Michael Goldsmith, Sadie Creese
Journal of Internet Services and Information Security, 2014
The cyber-security threat that most organisations face is not one that only resides outside their perimeter attempting to get in, but emanates from the inside too. Insider threats encompass anyone or thing which exploits authorised access to company information and resources to steal, corrupt or disrupt assets. Threat actors could include not only employees, but also contractors, trusted partners and in some cases clients. The nature of their access is usually persistent, as it is valid and required to conduct their roles, and as such, abuse of their privileges can pose a serious and real threat to the successful operation of the business. Whilst measures have been proposed for detecting previous attacks or those currently in progress, what would be much more desirable is to detect employees who are possibly becoming vulnerable to coercion or persuasion into conducting an attack of some form–enabling supportive or preventative action by the organisation to avoid escalation of an attack. Research into psychology and behaviour is indicating that it may be possible to detect such human vulnerability through analysis of language used–linguistics. In this paper we present a visual analytics tool for the assessment of sociolinguistic behaviours exhibited via e-mail communications, aimed at helping to identify people who are potentially at risk. We discuss the visual designs choices made to provide both detail and overview for the analyst for studying communications within a large group of users, and demonstrate this for a large real-world dataset of over 600 employees. We show how an analyst can use the tool to construct linguistic behavioural models to identify vulnerable employees. We propose that this approach could support wider insider threat prevention and detection systems.

A Critical Reflection on the Threat from Human Insiders – Its Nature, Industry Perceptions, and Detection Approaches
Jason R. C. Nurse, Philip A. Legg, Oliver Buckley, Ioannis Agrafiotis, Gordon Wright, Monica Whitty, David Upton, Michael Goldsmith and Sadie Creese
International Conference on Human Aspects of Information Security, Privacy, and Trust, 2014
Organisations today operate in a world fraught with threats, including “script kiddies”, hackers, hacktivists and advanced persistent threats. Although these threats can be harmful to an enterprise, a potentially more devastating and anecdotally more likely threat is that of the malicious insider. These trusted individuals have access to valuable company systems and data, and are well placed to undermine security measures and to attack their employers. In this paper, we engage in a critical reflection on the insider threat in order to better understand the nature of attacks, associated human factors, perceptions of threats, and detection approaches. We differentiate our work from other contributions by moving away from a purely academic perspective, and instead focus on distilling industrial reports (i.e., those that capture practitioners’ experiences and feedback) and case studies in order to truly appreciate how insider attacks occur in practice and how viable preventative solutions may be developed.

Reflecting on the Ability of Enterprise Security Policy to Address Accidental Insider Threat
Oliver Buckley, Jason RC Nurse, Philip A Legg, Michael Goldsmith, Sadie Creese
Workshop on Socio-Technical Aspects in Security and Trust (STAST) associated with 27th IEEE Computer Security Foundations Symposium (CSF), 2014
An enterprise's information security policy is an exceptionally important control as it provides the employees of an organisation with details of what is expected of them, and what they can expect from the organisation's security teams, as well as informing the culture within that organisation. The threat from accidental insiders is a reality across all enterprises and can be extremely damaging to the systems, data and reputation of an organisation. Recent industry reports and academic literature underline the fact that the risk of accidental insider compromise is potentially more pressing than that posed by a malicious insider. In this paper we focus on the ability of enterprises' information security policies to mitigate the accidental insider threat. Specifically we perform an analysis of real-world cases of accidental insider threat to define the key reasons, actions and impacts of these events -- captured as a grounded insider threat classification scheme. This scheme is then used to performa review of a set of organisational security policies to highlight their strengths and weaknesses when considering the prevention of incidents of accidental insider compromise. We present a set of questions that can be used to analyse an existing security policy to help control the risk of the accidental insider threat.
IEEE Xplore


Towards a Conceptual Model and Reasoning Structure for Insider Threat Detection
Philip A Legg, Nick Moffat, Jason RC Nurse, Jassim Happa, Ioannis Agrafiotis, Michael Goldsmith, Sadie Creese
Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications, 2013
The insider threat faced by corporations and governments today is a real and significant problem, and one that has become increasingly difficult to combat as the years have progressed. From a technology standpoint, traditional protective measures such as intrusion detection systems are largely inadequate given the nature of the ‘insider’ and their legitimate access to prized organisational data and assets. As a result, it is necessary to research and develop more sophisticated approaches for the accurate recognition, detection and response to insider threats. One way in which this may be achieved is by understanding the complete picture of why an insider may initiate an attack, and the indicative elements along the attack chain. This includes the use of behavioural and psychological observations about a potential malicious insider in addition to technological monitoring and profiling techniques. In this paper, we propose a framework for modelling the insider-threat problem that goes beyond traditional technological observations and incorporates a more complete view of insider threats, common precursors, and human actions and behaviours. We present a conceptual model for insider threat and a reasoning structure that allows an analyst to make or draw hypotheses regarding a potential insider threat based on measurable states from real-world observations.

Transformation of an uncertain video search pipeline to a sketch-based visual analytics loop
Philip A Legg, David HS Chung, Matthew L Parry, Rhodri Bown, Mark W Jones, Iwan W Griffiths, Min Chen
IEEE transactions on Visualization and Computer Graphics, 2013
Traditional sketch-based image or video search systems rely on machine learning concepts as their core technology. However, in many applications, machine learning alone is impractical since videos may not be semantically annotated sufficiently, there may be a lack of suitable training data, and the search requirements of the user may frequently change for different tasks. In this work, we develop a visual analytics systems that overcomes the shortcomings of the traditional approach. We make use of a sketch-based interface to enable users to specify search requirement in a flexible manner without depending on semantic annotation. We employ active machine learning to train different analytical models for different types of search requirements. We use visualization to facilitate knowledge discovery at the different stages of visual analytics. This includes visualizing the parameter space of the trained model, visualizing the search space to support interactive browsing, visualizing candidature search results to support rapid interaction for active learning while minimizing watching videos, and visualizing aggregated information of the search results. We demonstrate the system for searching spatiotemporal attributes from sports video to identify key instances of the team and player performance.
IEEE Xplore

Improving accuracy and efficiency of mutual information for multi-modal retinal image registration using adaptive probability density estimation
Philip A Legg, Paul L Rosin, David Marshall, James E Morgan
Computerized Medical Imaging and Graphics, 2013
Mutual information (MI) is a popular similarity measure for performing image registration between different modalities. MI makes a statistical comparison between two images by computing the entropy from the probability distribution of the data. Therefore, to obtain an accurate registration it is important to have an accurate estimation of the true underlying probability distribution. Within the statistics literature, many methods have been proposed for finding the ‘optimal’ probability density, with the aim of improving the estimation by means of optimal histogram bin size selection. This provokes the common question of how many bins should actually be used when constructing a histogram. There is no definitive answer to this. This question itself has received little attention in the MI literature, and yet this issue is critical to the effectiveness of the algorithm. The purpose of this paper is to highlight this fundamental element of the MI algorithm. We present a comprehensive study that introduces methods from statistics literature and incorporates these for image registration. We demonstrate this work for registration of multi-modal retinal images: colour fundus photographs and scanning laser ophthalmoscope images. The registration of these modalities offers significant enhancement to early glaucoma detection, however traditional registration techniques fail to perform sufficiently well. We find that adaptive probability density estimation heavily impacts on registration accuracy and runtime, improving over traditional binning techniques.

Force-Directed Parallel Coordinates
Rick Walker, Philip A Legg, Serban Pop, Zhao Geng, Robert S Laramee, Jonathan C Roberts
17th International Conference on Information Visualisation, 2013
Parallel coordinates are a well-known and valuable technique for the analysis and visualization of high dimensional data sets. However, while Inselberg emphasizes that the strength of parallel coordinates as a methodology is rooted in exploration and interactivity, the set of interaction techniques is currently limited. Axes can be re-ordered and brushing (simple, angular or multi-dimensional) can be performed. In this paper, we propose a force-directed algorithm and related interaction techniques to support the exploration of parallel coordinate plots through a physical metaphor. Our parallel-coordinates visualization offers novel user interaction beyond the standard techniques by allowing the user to rotate the axis according to forcedirected polylines. The new interaction provides the user with a more immersive experience for data exploration that results in greater intuition of the data, especially in cases where many polylines overlap. We demonstrate our approach, then present the results of a qualitative evaluation of the system.
IEEE Xplore


MatchPad: Interactive Glyph‐Based Visualization for Real‐Time Sports Performance Analysis
Philip A Legg, David HS Chung, Matthew L Parry, Mark W Jones, Rhys Long, Iwan W Griffiths, Min Chen
Computer Graphics Forum, 2012
Today real-time sports performance analysis is a crucial aspect of matches in many major sports. For example, in soccer and rugby, team analysts may annotate videos during the matches by tagging specific actions and events, which typically result in some summary statistics and a large spreadsheet of recorded actions and events. To a coach, the summary statistics (e.g., the percentage of ball possession) lacks sufficient details, while reading the spreadsheet is time-consuming and making decisions based on the spreadsheet in real-time is thereby impossible. In this paper, we present a visualization solution to the current problem in real-time sports performance analysis. We adopt a glyph-based visual design to enable coaching staff and analysts to visualize actions and events “at a glance”. We discuss the relative merits of metaphoric glyphs in comparison with other types of glyph designs in this particular application. We describe an algorithm for managing the glyph layout at different spatial scales in interactive visualization. We demonstrate the use of this technical approach through its application in rugby, for which we delivered the visualization software, MatchPad, on a tablet computer. The MatchPad was used by the Welsh Rugby Union during the Rugby World Cup 2011. It successfully helped coaching staff and team analysts to examine actions and events in detail whilst maintaining a clear overview of the match, and assisted in their decision making during the matches. It also allows coaches to convey crucial information back to the players in a visually-engaging manner to help improve their performance.


Hierarchical event selection for video storyboards with a case study on snooker video visualization
Matthew L Parry, Philip A Legg, David HS Chung, Iwan W Griffiths, Min Chen
IEEE Transactions on Visualization and Computer Graphics, 2011
Video storyboard, which is a form of video visualization, summarizes the major events in a video using illustrative visualization. There are three main technical challenges in creating a video storyboard, (a) event classification, (b) event selection and (c) event illustration. Among these challenges, (a) is highly application-dependent and requires a significant amount of application specific semantics to be encoded in a system or manually specified by users. This paper focuses on challenges (b) and (c). In particular, we present a framework for hierarchical event representation, and an importance-based selection algorithm for supporting the creation of a video storyboard from a video. We consider the storyboard to be an event summarization for the whole video, whilst each individual illustration on the board is also an event summarization but for a smaller time window. We utilized a 3D visualization template for depicting and annotating events in illustrations. To demonstrate the concepts and algorithms developed, we use Snooker video visualization as a case study, because it has a concrete and agreeable set of semantic definitions for events and can make use of existing techniques of event detection and 3D reconstruction in a reliable manner. Nevertheless, most of our concepts and algorithms developed for challenges (b) and (c) can be applied to other application areas
IEEE Xplore

Intelligent filtering by semantic importance for single-view 3D reconstruction from Snooker video
Philip A Legg, Matthew L Parry, David HS Chung, Richard M Jiang, Adrian Morris, Iwan W Griffiths, David Marshall, Min Chen
18th IEEE International Conference on Image Processing, 2011
In this paper we investigate the challenge of 3D reconstruction from Snooker video data. We propose a system pipeline for intelligent filtering based on semantic importance in Snooker. The system can be divided into table detection and correction, followed by ball detection, classification and tracking. It is apparent from previous work that there are several challenges presented here. Firstly, previous methods tend to use a fixed top-down camera mounted above the table. To capture a full table view from this is challenging due to space limitations above the table. Instead, we capture video data from a tripod and correct the viewpoint through processing. Secondly, previous methods tend to simply detect the balls without considering other interfering objects such as player and cue. This becomes even more apparent when the player strikes the cue ball. Our intelligent filtering avoids such issues to give accurate 3D table reconstruction.
IEEE Xplore


Multimodal retinal imaging: Improving accuracy and efficiency of image registration using Mutual Information
Philip A Legg
PhD Thesis, Cardiff University, 2010
This thesis addresses the challenging task of multi-modal image registration. Registration is often required in a number of applications, whereby two images are aligned to give matching correspondence between the features in each image. Such techniques have become popular in many different fields, especially in medical imaging. Multi-modal registration would allow for anatomical structure to be studied concurrently in both modalities, providing the clinician with a greater insight of the patient's condition. Glaucoma is a serious condition that damages the optic nerve progressively, leading to irreversible blindness. The disease can be treated so to prevent any further infection, however it can not be reversed. Therefore it is paramount that the disease is detected in the early stages so to minimise the affect of the condition.


A robust solution to multi-modal image registration by combining mutual information with multi-scale derivatives
Philip A Legg, Paul L Rosin, David Marshall, James E Morgan
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2009
In this paper we present a novel method for performing image registration of different modalities. Mutual Information (MI) is an established method for performing such registration. However, it is recognised that standard MI is not without some problems, in particular it does not utilise spatial information within the images. Various modifications have been proposed to resolve this, however these only offer slight improvement to the accuracy of registration. We present Feature Neighbourhood Mutual Information (FNMI) that combines both image structure and spatial neighbourhood information which is efficiently incorporated into Mutual Information by approximating the joint distribution with a covariance matrix (c.f. Russakoff’s Regional Mutual Information). Results show that our approach offers a very high level of accuracy that improves greatly on previous methods. In comparison to Regional MI, our method also improves runtime for more demanding registration problems where a higher neighbourhood radius is required. We demonstrate our method using retinal fundus photographs and scanning laser ophthalmoscopy images, two modalities that have received little attention in registration literature. Registration of these images would improve accuracy when performing demarcation of the optic nerve head for detecting such diseases as glaucoma.

Non-rigid elastic registration of retinal images using local window mutual information
Philip A Legg, Paul L Rosin, David Marshall, James E Morgan
Medical Imaging Understanding and Analysis (MIUA), 2009
In this paper we consider the problem of non-rigid retinal image registration between colour fundus photographs and Scanning Laser Ophthalmoscope (SLO) images. Registration would allow for cross-comparison between modalities, giving both appearence and reflectivity information which would provide clearer visualisation for demarcation of the optic nerve head as part of early glaucoma detection. Due to the differences in acquisition technique, along with alterations in the eye between acquisitions, there can be subtle non-rigid deformations present in the images that become apparent when performing rigid registration. Whilst this is negligible towards the centre of the SLO, the effect becomes much more noticable towards the periphery of the image, where it can be seen that not all blood vessels are aligned correctly. We propose a two-stage registration consisting of finding an initial rigid registration using Feature Neighbourhood Mutual Information [1], and then to use Local Window Mutual Information to quickly determine deformation parameters for a non-rigid solution. We test our method on 135 image pairs, with results showing improved registration accuracy compared to rigid registration.


Incorporating neighbourhood feature derivatives with mutual information to improve accuracy of multi-modal image registration
Philip A Legg, Paul L Rosin, David Marshall, James E Morgan
Medical Imaging Understanding and Analysis (MIUA), 2008
In this paper we present an improved method for performing image registration of different modalities. Russakoff [1] proposed the method of Regional Mutual Information (RMI) which allows neighbourhood information to be considered in the Mutual Information (MI) algorithm. We extend this method by taking local multi-scale feature derivatives in a gauge coordinate frame to represent the structural information of the images [2]. By incorporating these images into RMI, we can combine aspects of both structural and neighbourhood information together, which provides a high level of registration accuracy that is essential in application to the medical domain. Our images to be registered are retinal fundus photographs and SLO (Scanning Laser Ophthalmoscopy) images. The combination of these two modalities has received little attention in image registration, yet could provide much useful information to an Ophthalmic clinician. One application is the detection of glaucoma in its early stages, where prevention of further infection is possible before irreversible damage occurs. Results indicate that our method offers a vast improvement to Regional MI, with 25 of our 26 test images being registered to a high standard.
UWE Library


Improving accuracy and efficiency of registration by mutual information using Sturges' histogram rule
Philip A Legg, Paul L Rosin, David Marshall, James E Morgan
Medical Imaging Understanding and Analysis (MIUA), 2007
Mutual Information is a common technique for image registration in the medical domain, in particular where images of different modalities are to be registered. In this paper, we wish to demonstrate the benefits of applying a common method known in statistics as Sturges’ Rule for selecting histogram bin size when computing Entropy as a part of the existing Mutual Information algorithm. Although Sturges’ Rule is well known in the field of statistics it has received little attention in the Computer Vision community. By augmenting Mutual Information with Sturges’ Rule, we show that this offers an improvement to both the runtime of the algorithm and also the accuracy of the registration. Our results are demonstrated on images of the eye, in particular, Fundus images and SLO (Scanning Laser Ophthalmoscopy) images.