UFCFFY-15-M¶

Cyber Security Analytics¶

09: Case Study - Malware Analysis¶

Prof. Phil Legg¶

09: Malware Analysis¶

In this session we will cover:

  • History and current trends in Malware
  • Forms of malware analysis
  • Challenges and future research

Empirical Research¶

Empirical: "based on, concerned with, or verifiable by observation or experience rather than theory or pure logic"

  • https://scholar.google.com/
  • https://ieeexplore.ieee.org/Xplore/home.jsp
  • https://dl.acm.org/
  • https://www.connectedpapers.com/
  • https://vizsec.dbvis.de/

Important to be thinking about your Masters Projects by now...

History of Malware¶

Creeper

Creeper Virus - Bob Thomas 1971)

Morris Worm 1988

Morris Worm 1988

Q Walker

Q Walker

SubSeven

SubSeven 1999

ILOVEYOU

ILOVEYOU 2000

Today...¶

today

...and also...¶

today

Malware Research¶

What does a search for "malware" give us on Google Scholar?

Detection - Classification - Analysis tools - Mobile malware - Static vs dynamic - Environment-sensitive - Obfuscation methods - Machine learning

SubSeven

Forms of malware analysis¶

  • Static analysis (pre-execution data) - what can we identify prior to execution of the file? Code analysis?
    • Malware devs will intentionally obfuscate behaviour so that malicious intent is not apparent.
  • Dynamic analysis (post-execution data) - what can we identify during or after execution? System behaviour when sample runs?
    • Requires a sandbox environmement - however will the malware execute as intended in a sandbox?

Static analysis¶

PE File is a powerful Python library for examining portable executable files. Could also look at disassembler programs such as IDA Pro and radare2.

For more details on using PE File see this week's lab.

PEfile

Dynamic analysis¶

Cuckoo Sandbox is one of the most popular off-the-shelf tools for dynamic analysis. Requires a local Virtual Machine to use as a target VM, with Python installed (monitoring agent based in Python 2.7).

SubSeven

Cuckoo Example¶

Cuckoo

WannaCry¶

  • Port scanning (2 events)
  • The binary likely contains encrypted or compressed data indicative of a packer (2 events)
  • File hash: 24d004a104d4d54034dbcffc2a4b19a11f39008a575aa614ea04703480b1022c

Cuckoo

NotPetya¶

  • Locates and dumps memory from the lsass.exe process indicative of credential dumping (50 out of 350 events)
  • Installs itself for autorun at Windows startup (2 events)
  • Likely installs a bootkit via raw harddisk modifications (44 events)
  • File hash: cf01329c0463865422caa595de325e5fe3f7fba44aabebaae11a6adfeb78b91c

Cuckoo

Issues?¶

  • What if we try a benign file like PuTTy?
  • This file shows numerous signs of malicious behavior.” Scored 3.9 and 4.3 on two separate runs.
  • The binary likely contains encrypted or compressed data indicative of a packer (2 events)
  • File has been identified by 2 AntiVirus engines on VirusTotal as malicious (2 events)
  • Is the scoring reliable?

Cuckoo

Issues?¶

  • Environment-aware malwares can easily recognise a Cuckoo instance, and sandboxes.
    • Agent.py is the obvious indicator (required to link VM with Cuckoo)
    • Mentions of VMware/VirtualBox in registry / installation of guest additions
    • System uptime, number of file accesses, general system "wear and tear" may be lacking (despite installing common software)
    • (Online Cuckoo means we can not fully control and tailor the testing environment)
  • Can we build a custom sandbox with greater degree of control?

One more example...¶

  • Search http://app.any.run for "24d004a104d4d54034dbcffc2a4b19a11f39008a575aa614ea04703480b1022c".
  • What malware varient is this?
  • What URL is discovered when ran against ANYRUN?
  • What is significant about this URL?

Conclusions¶

  • Most of our focus here is on data collection - either in static or dynamic form.
    • Most crucial stage - once we have data we can think about best means to analyse and process this to solve our problem.
  • How to then use Machine Learning to detect, classify, or identify similarity? (see our earlier lab examples)
  • Malware devs want to evade detection and obfuscate - how can human analysts and ML models keep up in a timely manner?

Useful links¶

  • Cuckoo VM (nested VM): https://github.com/ashemery/CuckooVM
  • Cuckoo Online: https://cuckoo.cert.ee/
  • ANYRUN: https://app.any.run/
  • theZoo: https://www.github.com/ytisf/theZoo
  • vx-underground: https://www.vx-underground.org/
  • Python for Malware Analysis: https://malwology.com/2018/08/24/python-for-malware-analysis-getting-started/

Further reading¶

  • N. Miramirkhani, M. P. Appini, N. Nikiforakis and M. Polychronakis, "Spotless Sandboxes: Evading Malware Analysis Systems Using Wear-and-Tear Artifacts," 2017 IEEE Symposium on Security and Privacy (SP), 2017, pp. 1009-1024, doi: 10.1109/SP.2017.42.
  • Michael R. Smith, Nicholas T. Johnson, Joe B. Ingram, Armida J. Carbajal, Bridget I. Haus, Eva Domschot, Ramyaa Ramyaa, Christopher C. Lamb, Stephen J. Verzi, and W. Philip Kegelmeyer. 2020. Mind the Gap: On Bridging the Semantic Gap between Machine Learning and Malware Analysis. Proceedings of the 13th ACM Workshop on Artificial Intelligence and Security. Association for Computing Machinery, New York, NY, USA, 49–60. DOI:https://doi.org/10.1145/3411508.3421373
  • Mills, A.; Legg, P. Investigating Anti-Evasion Malware Triggers Using Automated Sandbox Reconfiguration Techniques. J. Cybersecur. Priv. 2021, 1, 19-39. https://doi.org/10.3390/jcp1010003
  • A. Mills, T. Spyridopoulos and P. Legg, "Efficient and Interpretable Real-Time Malware Detection Using Random-Forest," 2019 International Conference on Cyber Situational Awareness, Data Analytics And Assessment (Cyber SA), 2019, pp. 1-8, doi: 10.1109/CyberSA.2019.8899533.
In [ ]: