2018-06-20

Office 365 Activities API - Example Output

Yesterday CrowdStrike published details of the 'Office 365 Activities API' which is an extremely useful source of evidence for investigations, especially in cases of Business Email Compromise. The detailed post can be read here. Included was a Python module which allows one to pull the discussed data from O365.

I also put out a recent blogpost commenting on how much research goes unshared and how I feel it is important that companies and individuals share resources, tools and research so as to further the whole field. In the mean time I had been working with a test O365 account to generate some Activity API output for colleagues and industry friends who may not have seen it before and who don't necessarily have access to an O365 account for testing, I would be a little hypocritical if I didn't share this with the wider community so here goes.

Incidentally if you want to jump to the output data without reading all my waffle, it can be found here.

The Setup

My Office365 test environment is a single user tenant at the time of testing so my only user is also an admin user. This poses some limitations but I will be expanding testing with the addition of a further account shortly. In the mean time I have used two Office365 accounts which I have access to, one where I am an admin and one where I am a user. In both cases a user is able to access the Activities API data as it relates to their own account. 

The Procedure

My testing procedure comprised of logging into an O365 mailbox and performing actions which I thought would generate some interesting log data for a sample output. I actually performed these actions across two accounts and then merged the log so there was only a single file to review. 

The accounts were both configured to be accessible from a mobile device (Android mail configured with exchange settings). No effort was made to generate traffic from a mobile device, all activity was performed via the browser, however it is possible that this may have impacted the resultant logs.

After downloading the python module from the CrowdStrike GitHub the first task was to generate some known account activity. The following steps were performed:

--Account 1--

2018-06-20T07:36:00 - Open Outlook.com (autologin)
2018-06-20T07:37:00 - Sign out
2018-06-20T07:37:30 - Sign in
2018-06-20T07:38:00 - Approve sign in request via android app
2018-06-20T07:38:30 - Opt to "stay signed in"
2018-06-20T07:39:10 - Open 'Outlook' web app
2018-06-20T07:39:40 - open email in viewing pane
2018-06-20T07:40:00 - open email in jump out window
2018-06-20T07:40:25 - select [new (+)] to compose new email
2018-06-20T07:41:07 - send email
2018-06-20T07:41:28 - open sent items
2018-06-20T07:42:00 - open sent email message in viewing pane
2018-06-20T07:42:30 - press delete on sent email
2018-06-20T07:42:55 - expand folders view
2018-06-20T07:43:25 - right click and select delete all from deleted items
2018-06-20T07:44:00 - open Junk email
2018-06-20T07:44:44 - press settings
2018-06-20T07:45:45 - close settings
2018-06-20T07:46:46 - right click message and 'create rule'
2018-06-20T07:47:37 - save rule
2018-06-20T07:48:00 - select 'OK'
2018-06-20T07:49:15 - search for and open 'inbox rules'
2018-06-20T07:49:48 - press new
2018-06-20T07:50:15 - receive alert and open it
2018-06-20T07:50:42 - select investigate and Security & Compliance center opens
2018-06-20T07:51:17 - return and cancel rule creation
2018-06-20T07:51:55 - Sign Out
2018-06-20T08:44:15 - Sign In

--Account 2--

2018-06-20T08:44:25 - Search for 'wire transfer'
2018-06-20T08:44:25 - Search for 'banking'
2018-06-20T08:44:25 - Search for 'bacs'

The next step was to acquire an OAuth Token, for the purposes of testing I employed the Outlook Dev Center - OAuth Sandbox to generate the appropriate OAuth token. For those who are not familiar the procedure is as follows:

1. Select 'Authorise using your own account:



2. Authenticate with the relevant account credentials:


3. Copy Out the Access token that appears in the 'Access Token' box:


Note that the Access Token is a long string including characters which may preclude the 'double click to select call' so 'Ctrl-A'ing while within the text box is your best bet.

The copied string can then be added to the --Token variable when using CrowdStrike's retriever.py in the below command to dump the full history of activities for a user:


python retriever.py --user test@address.com --output activities.csv --token [OAuth Access Token String]

The above command worked for me but I have limited activity on my test account, more targeted extracts may be needed for genuine accounts.

My test output should be reviewed in conjunction with the list of activities performed, as above. Note that the times are approximate on account of me being human. The logs have been modified to replace my Login IP with '175.45.176.123' and the account addresses have been modified to 'test@address.com'.

Without further ado, the resulting csv output from my testing is available here.

Sharing is Caring - The Secret O365 API

The whole field of DFIR thrives and survives on shared research. Professionals who identify novel techniques or develop tools which they share outside their organisation help to drive progress, achieve better understanding and ultimately help the community in the arms race against bad actors.

A recent example which brought this to mind (and prompted this post) is the release of details regarding the Office 365 Activities API as detailed in this post by CrowdStrike.

What will likely follow is a rambling post about information sharing in DFIR so if you are here for technical details you are in the wrong place, check out the CrowdStrike writeup which is detailed and informative.

Business Email Compromise (BEC)

With the introduction of Outlook Web Access and even moreso since the increased adoption of Office 365 (and GSuite) for cloud email within businesses, Business Email Compromise (BEC) has been a growing issue. Over the last 6 years I have been involved in the investigation of many dozens of such incidents. Unsurprisingly, these cases have a tendency to merge into one in the memory but a few have stuck out over the years. 

The most significant was the first case I investigated, the customer suffered a compromise of a number of GSuite ('Google Apps for Business' at the time) email accounts and executive impersonation was used to defraud them of c1.25million Australian Dollars which was wired out by a duped finance employee in a series of transactions. This case was notable due to the certainty which which the customer insisted that it must be an inside job by disgruntled IT staff, which coupled with the terrible logging available in Google Apps for Business made for a difficult investigation.

Another especially notable case was in mid 2016 where a a large organisation had suffered a multi account compromise. During the incident the attacker(s) used compromised accounts to phish other internal accounts (a common MO in BEC) eventually resulting in the attacker gaining access to the accounts of senior staff. We were employed to conduct the investigation and the client had already engaged with Microsoft who as it turns out were in a particularly helpful mood that day. There was nothing to set this case apart for the multitude of O365 breaches I had investigated during that time, other than the additional visibility Microsoft were able to share into mailbox account activity.

The client had not enabled any of the available (but off by default) auditing within O365 and AzureAD however as their MS account rep put it, Microsoft may be able to assist but they would need to run queries in their big data system to get some logs. Don't judge too harshly, it was 2016, 'big data' was all the rage. Lo and behold a couple of days later they produced logs per impacted mailbox which contained unprecedented detail on what messages were accessed, searches run as well as login event information. This information is invaluable when investigating cases of BEC and in this case the motivation of the attacker was clear very quickly. Each mailbox was searched for a series of keywords (e.g. bacs, wire transfer, international payment etc) and no other information had been accessed where no hits were returned the account access ceased or the account was used to target other users. 

Since then I have encouraged clients to make similar requests to their Microsoft account rep in an effort to get these same logs, especially in cases where all other logging was not enabled. These requests have received inconsistent and commonly unsatisfactory responses especially where the customers were "small fry" tenants with users in the 1000's rather than 10000's. With that said, from time to time customers requests were met with the delivery of logs, and where provided the logs were generally of a consistent format. What I didn't know at the time was that the output we were being provided was associated with the 'Office 365 Activities API' and that it is available to all O365 users irrespective of enabled logging (at least for now).

The 'Secret API'

During DFIR conferences, at talks and on Twitter, the topic of the 'Secret API' has arisen from time to time. It was evident that access to this information was possible and that a number of individuals and organisation had decided to keep this to themselves in an effort to maintain some sort of competitive advantage. The existence and functionality of the API had been kept quiet by those in the know and  I have heard that some SIEM vendors and IR consulting firms have boasted to prospective customers that they have abilities that their competitors do not as it relates to investigating or integrating with O365 logging.

In this case, and in other similar examples I feel this poses an interesting dilemma to anyone who identifies similar novel analysis methodologies. Ultimately I feel those who declined to share the information did so at the detriment of other victims. In coming to this conclusion I have considered the following:
  • DFIR survives on shared research. The reason I started this blog was in recognition of the fact that I have benefited from the research and efforts of others who have investigated artifacts and publicly disclosed their findings. We need to see beyond short term commercial gain in the shared battle against bad actors.
  • I can't imagine that the 'competitive advantage' these organisations benefited from could be all that significant. Clearly it was being touted enough that it was being used to try to win business but do prospective customers really buy into these claims when so often they are only marketing hype?
  • It is possible that this information becoming public may cause Microsoft to close down the API prematurely. This is the only defensible justification I can see, I appreciate that widespread disclosure may have (and still may) cause this to prematurely vanish. It isn't lost on me that the CrowdStrike post comes at a time when a number of firms had intimated that they may publish something and indeed a time when the findings have a limited shelf life, as it would appear the API is set to be EOL in 2019. 
  • There is also the risk of a "If other firms aren't sharing, why should I" mentality and I can appreciate this. I've worked in firms where research is seldom published and as such I was part of that problem. In 'listen only' mode some organisations will leech up the work of others while holding tight to their own original research. The answer however is not to close of but rather lead by example and lay rightful praise at the feet of our competitors when they do good research , share findings and further the DFIR/Infosec community as a whole.
To that end I think credit should be given to CrowdStrike for making the decision to share this information with the wider community while other organisations who were evidently 'in the know' did not. They are by no means unique and fantastic work is constantly being published by loads of firms in the field.

I would be interested to hear other peoples take on the topic of sharing research and findings which might otherwise offer a competitive advantage, clearly one size doesn't fit all but are there other considerations which I haven't detailed above?

2018-06-11

Using Extended MAPI Properties to determine email sent time

Recently, David Cowen at 'Hacking Exposed - Computer Forensics Blog' has resumed the 'Zeltser Challenge' of daily blogging. In doing so, he has highlighted (at least to me) that after a short initial flourish this blog has been somewhat neglected, taking a back seat while regular work and other commitments have ramped up again.

Inspired by David's effort and with an eye on his 'Sunday Funday' challenge series I thought I would respond to the most recent challenge via blogpost. The challenge in question is detailed at this post and in essence asks the question:
'What within an email message sent from Outlook and connected to an Exchange server would allow an examiner to determine when an email was sent from the system they are examining presuming they found the message in the sent folder within the mailbox.'
In my day-to-day work I have limited need to analyse emails for the senders machine so this posed an interesting opportunity to look into Extended MAPI Properties for the first time in a long time.

The Setup

When performing this type of research I would normally aim to set up a number of test environments combining a mix of OS, client and configuration versions however for the purposes of this exercise I have limited my research to a single setup, as follows:


Windows 10 (10.0.17134.48)

Outlook 2016 (16.0.9330.2073)

Office365 Account
The test setup comprised:

  • Microsoft Windows 10 (10.0.17134.48)
  • Outlook 2016 (16.0.9330.2073)
  • An Office365 email account configured as an exchange mailbox using default settings within Outlook.

Examining MAPI Properties

There are a number of ways to view MAPI properties within various forensic suites and using plugins for Outlook. For the purposes of this analysis I used Kernel OST Viewer and will look to verify findings using a secondary tool, however have not done so at this stage.

Kernel OST Viewer 15


Viewing MAPI Properties in Kernel OST Viewer could not be simpler. After launching the application, press 'Select File'. 


Kernel OST Viewer 15

This will present you with the 'Source File Selection' dialog box.

Kernel OST Viewer 15

'Browse' to the location of your OST for analysis, within this configuration the OST is located in the default location of 'C:\Users\[UserName]\AppData\Local\Microsoft\Outlook'. Once selected, you will be presented with a further box, select 'Finish' and the OST will be parsed.

Select any message within the messages table and then select the 'Advanced Properties View' tab in the Viewing Pane. In the image below we have navigated to the Sent Items folder as this will be of note during testing.

Kernel OST Viewer 15

Now that we have a way to examine the MAPI Properties we can commence testing.

Date Fields in MAPI Properties

The first thing I examined was sending a test message and examining what MAPI Properties were associated with it and particularly those which contained date values. Immediately notable are 5 values which contain Windows Filetime data. In all test messages the following Properties were populated:

Principal MAPI Timestamps


As detailed in the image the identified Property name and Property Tag associated with these properties was:
PR_CREATION_TIME | 0x30070040
PR_CLIENT_SUBMIT_TIME | 0x00390040
PidLidValidFlagStringProof | 0x85BF0040
0x00000F0A | 0x0F0A0040
0x00000F02 | 0x0F020040

In addition to these properties, certain other properties also contained dates within strings such as Property 0x0000844D and various others.
0x0000844D MAPI Property

While these other properties are of various forensic value and indeed may assist in corroborating dates and times observed elsewhere, they have been excluded from this testing for the moment.

Testing

In testing I chose to test three scenarios to examine the impact on the MAPI Properties on the items as they existed within sent items.

Scenario 1: A user presses compose, generates their message and sends it.
Scenario 2: A user presses compose, generates a message, closes it (saving a draft), reopens the message making modifications and then sends the message.
Scenario 3: A user presses compose, generates their message and sends it while the computer is disconnected from the network, causing the message to be held in the users outbox, the message is then later transmitted when the network reconnects.

Each test was repeated three times with no observed differences between the examined MAPI Property behavior within each repetition of a scenario. One example for each is detailed below in the Results section.

Results

--Test 1--
Tested Scenario: 
Message composed and sent

Actions Performed:
20180611T22:30:22 – Pressed 'Compose' within Outlook and typed email message
20180611T22:31:03 – Pressed 'Send'.

MAPI Property values:
0x00000F0A = 20180611T22:31:03
0x00000F02 = 20180611T22:31:03
PR_CREATION_TIME = 20180611T22:31:03
PR_CLIENT_SUBMIT_TIME = 20180611T22:31:03
PidLidValidFlagStringProof = 20180611T22:31:03

Notably the property values all show a consistent date/time, the PR_CREATION_TIME and PR_CLIENT_SUBMIT_TIME show the same timestamp.

--Test 2--
Tested Scenario: 
Message Composed, Saved as Draft, Opened and Sent

Actions Performed:
20180611T22:40:50 – Pressed 'Compose' within Outlook and typed email message
20180611T22:41:25 – Closed email message
20180611T22:41:30 – Select 'Save; when prompted to save as draft
20180611T22:42:16 – Opened Draft and updated
20180611T22:43:09 – Pressed 'Send'.

MAPI Property values:
0x00000F0A = 20180611T22:43:09
0x00000F02 = 20180611T22:43:10
PR_CREATION_TIME = 20180611T22:43:09
PR_CLIENT_SUBMIT_TIME = 20180611T22:43:09
PidLidValidFlagStringProof = 20180611T22:43:10

Once again, despite the saving of a draft all timestamps associated with the examined MAPI Properties are within 1 second of the 'Send' button being pressed.

--Test 3--
Tested Scenario: 
Message Composed, Sent while system is offline and held in Outbox, Transmitted when system reconnected to network.

Actions Performed:
20180611T22:48:02 – Pressed 'Compose' within Outlook and typed email message
20180611T22:49:50 – Pressed 'Send'
20180611T22:50:30 – Reconnect Network
20180611T22:51:02 – Pressed 'Send/Receive'

MAPI Property values:
0x00000F0A = 20180611T22:51:04
0x00000F02 = 20180611T22:49:50
PR_CREATION_TIME = 20180611T22:51:04
PR_CLIENT_SUBMIT_TIME = 20180611T22:51:04
PidLidValidFlagStringProof = 20180611T22:49:50

Notably within this test we see groupings of timestamps. PidLidValidFlagStringProof and 0x00000F02 appear to reflect the time that the user pressed 'Send' while PR_CREATION_TIME, PR_CLIENT_SUBMIT_TIME and 0x00000F0A appear to reflect the time the message was actually sent from the computer once it had network connectivity restored.

Conclusion

In answer of the challenge the PR_CLIENT_SUBMIT_TIME and PR_CREATION_TIME most accurately reflect when a message was sent from a system. This is consistent with Microsoft documentation which indicates that "The store provider sets PR_CLIENT_SUBMIT_TIME to the time that the client application called IMessage::SubmitMessage." and that PR_CREATION_TIME is "set by message store providers on outgoing messages." as detailed here.

It should be noted however that in cases of a discrepancy between these timestamps and 0x00000F02  or PidLidValidFlagStringProof then it would appear that these may serve as an indication that while the message was transmitted at a particular time the user may have attempted to send (i.e. pressed 'Send') at a time prior to it's transmission.

Further testing

This analysis has been quick and dirty and leaves lots of unanswered questions. Some areas for possible further research may include:
  • Reviewing of other timestamps as they appear in other MAPI Properties.
  • Review of timestamp behavior when system time is tampered with
  • Review of MAPI Property fields when comparing messages which are sent using 'Send/Receive' vs automatic message synchronization following a disconnected network connection

2018-02-02

Rebuilding Hardware Raid in EnCase 7/8

Recently I needed to rebuild a hardware RAID within EnCase from physical images of the component disks. Some years ago this was a common task which I did on a regular basis, and could achieve with my eyes closed.

Back then my principal analysis tool was EnCase 6 and the method of rebuilding a RAID was relatively straightforward, the required menus and options were in a logical enough location, but of course with the advent of EnCase 7 Guidance made every effort to hide functionality and generally make our lives more difficult.

More recently, I think I have probably only had to rebuild a hardware RAID probably four or five times in the last 3 years, each time I have spent significantly longer trying to remember where Guidance saw fit to hide the menu item than I did in assessing the RAID and rebuilding it. This time I have James Habben (@JamesHabben) to thank for reminding me where I needed to look within EnCase.

In any event, as one George W. Bush once said, "fool me once, shame on — shame on you. Fool me — you can't get fooled again”, so I have committed to documenting the required process for future googlers, and indeed myself in probably 12 months time.

A few points before I get into the process:
  • There are a number of ways to skin this cat, my intention in this post is just to cover the mechanics of rebuilding the RAID in EnCase 7/8. I will likely follow up with a post which covers one method to identify the RAID configuration if this is unknown, but it is out of scope for today.
  • Rebuilding a software RAID is much simpler, and much better documented. If you are dealing with a Windows software RAID then the following will get you on the right path to rebuilding it in EnCase:
  • The images I have to hand and therfor the process as demonstrated in the examples in this post are the simplest situation: a two disk RAID-0 with known stripe size. The process doesn't change dramatically for more complex RAID setups. 
  • And finally, X-Ways Forensics is significantly better/ easier for rebuilding RAIDs. It was 8 years ago, this hasn't changed, in fact X-Ways hasn't improved in this area to my knowledge just EnCase has somehow become worse at it.

Recreating the RAID in EnCase 7/8

Launch EnCase (7 or 8), create a new case, and add your physical images as evidence items via either 'Add Evidence File' for E01, Ex01, vmdk, or vhd or via 'Add Raw Image' for RAW/DD images, per the below screenshots:

EnCase 7 and EnCase 8 Adding Evidence Items

Technically you can perform this same set of actions on two or more physical devices connected to your analysis system with the 'Add Local Device' functionality too.

Once the images are added, you should be in the 'Evidence Tab' with the individual items visible, per the below screenshot. 

Component Disks added as Evidence items in EnCase

Within EnCase 7, the Super Top Secret menu item you require is located via pressing the down arrow in the far right hand corner of the Evidence Tab toolbar. This is the center of the three down arrows on the right hand side.

You know, the down arrow.

I for one don't understand why people find it so hard to find...

Selecting the down arrow presents you with the following menu, from which you need to select 'Create Disk Configuration...'

Create Disk Configuration... menu option

Unfortunately, word got back to guidance that a small group of 5-10 users had actually managed to locate and use the 'Create Disk Configuration...' functionality and as such they made changes to hide it again come the release of EnCase 8. The same menu item is now contained within the dropdown menu denoted with a cog.

EnCase 8

The remainder of the steps are consistent between 7 and 8, so screenshots will be limited to those of 7 as it is the less offensive of the two interfaces. We need to configure the RAID within the newly opened 'Disk Configuration' window:
Disk Configuration Window

Name the RAID
Enter a name for your RAID in the top left text entry box.

Select the RAID Type if known
Select the type of RAID you are rebuilding from the Disk Configuration list on the left of the window, these translate as follows:

Stripe = RAID0
Mirror = RAID1
RAID-5  - See below
Span = JBOD
Simple = JBOD, mab
RAID-5 Symmetric - See below
RAID-5 Solaris = Pass, one assumes Solaris employed a funky RAID-5 implementation
RAID-5 Asymmetric - See below
RAID-1E (https://en.wikipedia.org/wiki/Non-standard_RAID_levels#RAID_1E)

The various RAID-5 options relate to different implementations of RAID-5 the key difference is where the parity stripe is located in each pass. A helpful reference can be found here. Once upon a time a DR Engineer who specialised in damaged RAID Recovery taught me how to use the 'RCDC' signature within an NTFS journal, an excel spreadsheet and some basic deduction to determine exactly what RAID configuration and stripe size you are dealing with but to be honest once I have sussed out the stripe size I normally try RAID-5 first, then RAID-5 Symmetric and RAID-5 Asymmetric until it works.

In the event that I was unsure of the RAID configuration or disk order my process used to be to determine the stripe size using this method then use X-Ways to allow for expedient trial and error. These days RAID Reconstructor can do a lot of the hard work for you. I will likely cover RAID Reconstructor in a follow up post.

Add Component Devices

Order matters here, so if you happen to know the order the drives came out of the RAID device start there, otherwise RAID Reconstructor can help you figure it out.

Right click inside the 'Component Devices' area and select the first disk, if you have a known offset on the drive before the RAID starts you will need to change the Start Sector and Total Sectors to reflect this.

Adding Component Devices

The Total Sectors will automagically be populated with the total number of sectors in your image, however if you amend the Start sector you will need to reduce the Total Sectors value by the same amount or you will receive an error. Again, if you do not know whether there is an offset, RAID Reconstructor will be able to assist. Press OK to add the disk

Repeat this process for each of the disks (in order), you cannot reorder the disks once they have been added. You will need to delete them and re-add.

Note that if you have a RAID-5 (or another RAID config with redundancy) and you are missing a disk it is possible to add a Null Device. Simply Right Click, select New then check the 'Create Null Device' option. This will cause the image selection to grey out and pressing OK will add a Null Device:

Adding a Null Device

Stripe Size
Once you are happy with your added disks and order you can set the stripe size. Make sure to note that the size requested here is in KB, not sectors or bytes. If your config information (as provided by the system owner, found in RAID BIOS or via RAID Reconstructor) is not in KB then you will need to do a calculation to determine the appropriate value.

Once you are happy with your configuration, select 'OK'. A new evidence item, named per your chosen name will be added. In this case we have named our RAID 'Demo'.


Selecting that evidence item will cause EnCase to open it, and with any luck interpret the filesystem, parse the MFT etc.

In the event that you have made an error, you will likely find the device opens with nofile system It is time to go back to the evidence pane, select the checkbox for the RAID, use the same drop-down menu and select 'Edit Disk Configuration...':

Edit Disk Configuration Menu
You may need to repeat this process a few times if you are trying to guess a config. As previously mentioned, the process of brute forcing config in this way is somewhat easier in X-Ways so if you have a licence available maybe use that for your testing. Furthermore, much of the guesswork can be removed with the use of RAID Reconstructor.

Hopefully this post helps a few people find the right menu item when attempting to rebuild / de-RAID a hardware RAID within EnCase, or at the very least here's hoping I remember this post when I next forget how to do it!

2018-01-23

Installing SIFT Workstation under Windows Subsystem for Linux

SIFT

In a recent post I alluded to the fact that I had successfully installed SIFT Workstation under Windows Subsystem for Linux (WSL). A number of people have zeroed in on that and had queries about this setup (and its limitations) so I thought I would follow up with a brief how-to.

For the uninitiated, the SIFT Workstation is a fantastic tool for forensic investigators and incident responders, put together and maintained by a team at SANS and specifically Rob Lee (@RobLee). It is a collection of open source tools for forensic analysis and is available bundled as a virtual machine. In a lot of cases the most appropriate way to use it is exactly like that, as a VM.

There are three common ways in which SIFT is used, under various circumstances I have had reason to employ all three:
  1. On a Type 1 hypervisor. I have an instance running within ESXi which I SSH into for analysis.
  2. Installed as the base OS on physical hardware. On more than one occasion I have installed Ubuntu and then the SIFT Workstation onto an old laptop to use for analysis.
  3. Via a Type 2 hypervisor such as VMWare Workstation or VirtualBox. I assume this is the most common method that people use SIFT, and indeed SANS provide a preinstalled OVA which can be downloaded here.

All of the above solutions have their merits, but with the advent of WSL we have a new option for running the various Linux utilities bundled within SIFT. While researching this post I stumbled across the fact that the SIFT Manual Installation instructions in fact reference the use of SIFT under WSL but I hope to provide a little bit of additional detail and highlight a couple of gotchas.

If you haven't already installed WSL and Bash you will need to start there, however if not you have installed these you can jump to Installing SIFT .

Installing WSL

The SIFT installation process detailed later requires internet access and as such I will focus on the online method of installing WSL, with that said an offline method is detailed in my previous post 'Windows Subsystem for Linux and Forensic Analysis'.

First ensure you are running Windows 10 Anniversary Update or later (build 1607+) on a 64-bit system, if not you will need to upgrade to this version to have WSL available.

The quickest and easiest way to enable WSL is to use PowerShell. Open PowerShell as Administrator and run the command:

Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux

Alternatively it can be enabled via the 'Windows Features' dialog. This can be accessed via Control Panel -> Programs -> Programs and Features -> Turn Windows features on or off. Locate the check box for Windows Subsystem for Linux, per the below screenshot, and select it:


Next we need to install the distribution of choice, which for SIFT will want to be Ubuntu. This is available for download via the Microsoft store. Once installed, select launch and you will be prompted to create a UNIX user account. Once the account is created you are good to go.

Installing SIFT

The first point to note is that SIFT cannot be installed from the root account. Depending on how you have configured WSL this may be the default and only user account on your install. If that is the case then you will need to create a new user account, as below:

Create new user account
Launch Bash, either via launching the 'Ubuntu' app or alternatively you can launch it from the Windows Command Line using the 'bash'.

Create a new user account with:

useradd -m sansforensics

Create a password for the account:

passwd sansforensics

When prompted, enter and re-enter a new password for the account.

Add the user account to the sudo group

sudo usermod -aG sudo sansforensics

Switch user to the new account:

su - sansforensics

The following set of commands can then be executed to download, verify and install the sift-cli-linux installer:

Using sift-cli-linux to isntall SIFT

wget https://github.com/sans-dfir/sift-cli/releases/download/v1.5.1/sift-cli-linux

wget https://github.com/sans-dfir/sift-cli/releases/download/v1.5.1/sift-cli-linux.sha256.asc

gpg --keyserver pgp.mit.edu --recv-keys 22598A94

gpg --verify sift-cli-linux.sha256.asc

sha256sum -c sift-cli-linux.sha256.asc

Verify that the output contains 'sift-cli-linux: OK', you will receive an error regarding improperly formatted lines which can be ignored.

sudo mv sift-cli-linux /usr/local/bin/sift

chmod 755 /usr/local/bin/sift

Finally the sift installer can be executed to install the SIFT packages only, with the following command:

sudo sift install --mode=packages-only

This process will take a short while to complete but at the end it should indicate that is has completed with no errors.

Limitations

Image Mounting
Image mounting can be problematic. Due to fuse driver issues, using ewfmount, mountwin or imageMounter.py will result in the following error: 
fuse: device not found, try 'modprobe fuse' first
Unable to create fuse channel.
An alternative solution is to mount the image in windows using a tool such as FTK imager, then to mount the corresponding volume using drvfs within WSL. In the below example FTK imager has been used to mount an E01 image both Physical and Logical:


The notable volume has been mounted as H, and this can be presented to WSL with the following commands:

sudo mkdir /mnt/h

sudo mount -t drvfs H: /mnt/h

I have not performed extensive testing to understand the full implications of the different mount methods however I have found that using the 'File System/ Read Only' option, per the below, can be more reliable albeit slower:


The above method will not be suitable to work with all tools or use cases. 

No GUI Support
The lack of an X Server prevents you from running graphical applications. This isn't a huge issue with SIFT as the overwhelming majority of the tools you will have installed SIFT for are command line. By default attempting to run an GUI application such as firefox will result in the following error:



But fortunately for us, installation of an X Server for Windows will allow you to run GUI applications from WSL. I have tested XMing and found it to be reasonably reliable. Once you download, install and run XMing within Windows configuring WSL to export the display to it is very easy, simply execute the following command:

export DISPLAY=:0

Now running Firefox will result in a new window being created within Windows. This functionality also has interesting implications as to evidence storage. Notably this allows for the installation of a browser where history and internet browsing artifacts will be within the WSL filesystem.

2017-10-17

Further Forensicating of Windows Subsystem for Linux

This is a short follow up to my two recent posts, 'Windows Subsystem for Linux and Forensic Analysis' and 'Forensic Analysis of Systems that have Windows Subsystem for Linux Installed'. No sooner had I pressed publish on the latter, than a new Windows Insider Program update was pushed to my PC. Prior to this update my attempts to install openSUSE and SLES were failing repeatedly so I was unable to test whether multiple userlands could be installed side by side. The update appeared to have resolved the issue, so I was keen to dive in and confirm the answer to that niggling question. Unfortunately for me, it opened a can of worms which necessitated this follow up post to expand upon, and in some instances correct, its predecessors.  

The immediate question was quickly answered. Can an individual user install multiple userlands/ distributions side by side:


Yes. They. Can.

Per the screenshot, I successfully installed four userlands side by side, Ubuntu (via Beta install method), Ubuntu (Via Windows Store), openSUSELeap (Via Windows Store) and SUSELinuxEnterpriseServer (Via Windows Store). This begs the question, where are the corresponding files for these distinct Linux user land installs. The eagle-eyed reader may have cottoned onto the fact that two instances of Ubuntu could be installed, one using each of the two installation methods. In my prior testing I was limited to Ubuntu installed via the Beta installation method, however the other three installations are completed using the Windows Store. 


Detecting Windows Subsystem for Linux (installed via Windows Store)

As per my previous posts, if you install WSL using the beta method the Bash executable will be found at:

%systemroot%\System32\bash.exe
i.e. 'C:\Windows\System32\bash.exe'

However, installation of any of the three currently available userlands via the store causes both the application files and the associated filesystem to be installed in different locations. The installation is still on a per user basis, so the points raised regarding activity attribution still stand. The core executable associated with each of the currently available userlands can be found at:

C:\Program Files\WindowsApps\CanonicalGroupLimited.UbuntuonWindows_1604.2017.922.0_x64__79rhkp1fndgsc\ubuntu.exe
C:\Program Files\WindowsApps\46932SUSE.openSUSELeap42.2_1.1.0.0_x64__022rs5jcyhyac\openSUSE-42.exe
C:\Program Files\WindowsApps\46932SUSE.SUSELinuxEnterpriseServer12SP2_1.1.0.0_x64__022rs5jcyhyac\SLES-12.exe

This location is liable to change with future application and Windows updates. Similarly, the location of the root file system for each is quite different from the location where the Beta version installs.

To summarise my previous posts, installs of ‘Bash for Ubuntu for Windows’ using the beta installation method cause notable files to be created within C:\Users\[Username]\AppData\Local\lxss, with specific subfolders for the home, root and rootfs which are then mounted when Bash is executed. Installation of any of the currently available userlands via the Windows Store now creates the associated file system within the packages directory for that application. The current paths where the rootfs is located for each install is as follows:

C:\Users\[Username]\AppData\Local\Packages\46932SUSE.openSUSELeap42.2_022rs5jcyhyac\LocalState\rootfs
C:\Users\[Username]\AppData\Local\Packages\46932SUSE.SUSELinuxEnterpriseServer12SP2_022rs5jcyhyac\LocalState\rootfs
C:\Users\[Username]\AppData\Local\Packages\CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc\LocalState\rootfs

This location is also liable to change with future application and Windows updates. What is notable is that while the beta install separates /, /home and /root into distinct locations which are individually mounted, the 'rootfs' directory is mounted as / and thereafter /home and /root exist within that structure. You may recall that one benefit of the beta method is that when a user uninstalled WSL then the /rootfs directory was deleted but /home was left intact and data which may be pertinent to a case was preserved, unfortunately for us this is no longer the case. If a user chooses the uninstall option either via the store or by right clicking in the start menu shortcut all user data is also removed.

The Beta install also created a notable Registry key at:

NTUSERDAT\SOFTWARE\Microsoft\Windows\CurrentVersion\Lxss

However, after a Windows update and installing subsequent additional userlands, this key and its content have been moved one layer deeper, and they can now be found at:

HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Lxss\{12345678-1234-5678-0123-456789abcdef}

Additional values have also been added, per the below screenshot, you will note that there is now a DistributionName value, for Beta 'Bash on Ubuntu on Windows' installs which is set to... 'Legacy'. Evidencing that the timing of my previous posts was impeccable as ever:


{12345678-1234-5678-0123-456789abcdef} Registry Key

The '{12345678-1234-5678-0123-456789abcdef}' is the distro_guid associated with the particular distribution, and as such, analysis of the contents of the HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Lxss key will allow you to identify which userland environments are currently installed for any particular user. At the time of writing there are four distro_guid keys you may observe:


  1. {12345678-1234-5678-0123-456789abcdef} - "Legacy" (Bash on Ubuntu on Windows)
  2. {b651c2ea-ab01-46ae-8c95-09209e4272fd} - SLES-12
  3. {d4085d24-9def-43b3-9a17-de87d9bba371} - openSUSE-42
  4. {ff9afada-c0e4-4c9c-ac50-e5fb13b4b142} - Ubuntu


The SLES, openSUSE and Ubuntu keys contain three additional values which are not found for the legacy install. Specifically, 'DefaultEnvironment' (REG_MULTI_SZ), 'KernelCommandLine' (REG_SZ) and 'PackageFamilyName' (REG_SZ). A screenshot of one example for {ff9afada-c0e4-4c9c-ac50-e5fb13b4b142} (Ubuntu), is provided below:


{ff9afada-c0e4-4c9c-ac50-e5fb13b4b142} Registry Key

By default, DefaultEnvironment and KernelCommandLine were found to be the same for all three of the tested userlands but may be modified by a user or in later updates. 'DefaultEnvironment' contains environment variables and 'KernelCommandLine' contains, you guessed it, the kernel command line statements. Their values are detailed below for reference.

DefaultEnvironment
HOSTTYPE=x86_64
LANG=en_US.UTF-8
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
TERM=xterm-256color

KernelCommandLine
BOOT_IMAGE=/kernel init=/init ro

PackageFamilyName
The PackageFamilyName contains a single string which relates to the AppStore package name, and as such can help identify the correct file system for active installs. For our three userlands the values were:

46932SUSE.SUSELinuxEnterpriseServer12SP2_022rs5jcyhyac
46932SUSE.openSUSELeap42.2_022rs5jcyhyac
CanonicalGroupLimited.UbuntuonWindows_79rhkp1fndgsc

As detailed earlier in the post, the application path and file system locations contain the same strings.

Analysis of Windows installed applications has a number of other implications which I won't explore at this time. This includes the fact that there are a myriad of additional data sources which relate to the user activity with regard to Windows Store installed Applications, and the fact that prefetch is not created or updated (unlike when launching Bash installed via the Beta method).

2017-10-11

Forensic Analysis of Systems that have Windows Subsystem for Linux Installed

*** EDIT 2017-10-17 ***


Windows Subsystem for Linux was in Beta at the time of writing and all artifacts/ paths mentioned relate to installation of WSL/ 'Bash on Ubuntu on Windows', using the Beta methodology detailed in my last post. Installation of a Linux userland via the Windows Store causes notable files to be located elsewhere within Windows. This is addressed in my addendum post 'Further Forensicating of Windows Subsystem for Linux'.


*** /EDIT 2017-10-17 ***

In my last post I provided a little bit of background on Windows Subsystem for Linux ('WSL') and provided details on some of the artifacts you can look out for to identify if it has been installed and enabled. In this post we will dive into a few of the notable artifacts which speak to user activity within WSL and also some considerations when reviewing this information. The more you look the more you find, and there are a myriad of interesting edge cases you may come across. The relevance of any particular artifact will depend on your investigation and the way the suspect/user was using WSL.


In this post I will be focusing on a few specific artifacts, particularly those which relate to user activity and which have unique aspects as they relate to WSL when compared to their counterparts in a full Linux host. This is by no means a comprehensive list as the topic could and does fill multiple books, written by far smarter people than me.


Forensically Interesting Artifacts

One point to note is that where I specify an artifact location I will be referencing its location on disk as it appears when viewed in Windows or during dead box analysis this will be the path displayed in tools. Paths as they appear to the user within WSL will be different due to the various mount points that are employed, this is addressed further later in the post.

It is key to note that the installation of WSL is user specific, so there may be multiple instances of WSL installed on one system. How many you will find will depend on the number of user accounts and the number of those users who have installed a userland. My reading indicates that multiple userlands can be installed side by side, but I have yet to experiment with this. In any event, it's something to look out for and if it is indeed possible, I would assume that it will result in multiple locations to analyse per user.


Default Username

As previously mentioned, the userland installation is per user. Additionally, authentication is entirely independent of Windows authentication/ login credentials. A user is requested to define a UNIX username and password when they first run Bash. All data associated with the filesystem for the Linux userland is contained within the associated Windows users profile at %localappdata%\Lxss\. In this MSDN blog post Microsoft elaborate on this, saying "Each Windows user has their own WSL environment, and can therefore have Linux root privileges and install applications without affecting other Windows users".

It should therefore be noted that irrespective of the chosen Linux username any activity identified within a particular WSL install can/should be attributed to the corresponding Windows User Account. To labour the point and make this abundantly clear, some example scenarios are outlined below, the examples are based on a Windows system with multiple users, 'UserA', 'UserB', etc:

Scenario 1: User 'UserA' installs WSL and 'Bash on Ubuntu on Windows' and sets their Linux username as 'UserA'. Based upon human nature it is likely that a significant proportion of users will use the same username and password for WSL as they do for Windows. This is probably the most common scenario you will encounter.

Scenario 2: User 'UserA' installs WSL and 'Bash on Ubuntu on Windows' however sets their UNIX username to be 'UserB'. At a glance artifacts associated with the WSL user 'UserB' may seem to relate to the Windows User 'UserB' however they should in fact be attributed to 'UserA'. For example, the .bash_history file for this user will be located at:

C:\Users\UserA\AppData\Local\Lxss\home\UserB\.bash_history 

i.e. within the windows profile associated with UserA, the same can be said for almost all notable artifacts as they relate to WSL and user activity.

Scenario 3: Users 'UserA' and 'UserB' both install WSL and 'Bash on Ubuntu on Windows' and set their UNIX usernames as 'UserC'. These two installations are completely distinct and there is no connection between the two 'UserC' accounts despite them sharing a name.

The key point to note is that installations are user specific, independent from one another and the majority of relevant artifacts will exist within a Windows User directory or NTUSER.DAT for a specific associated Windows User Account.

The username configured at the time of installation is recorded within the NTUSER.DAT for the associated Windows user at the location below:

Key: SOFTWARE\Microsoft\Windows\CurrentVersion\Lxss
Value: DefaultUsername (RegSz)
Data: [username]

Various other registry keys were identified which relate to lxss and Windows Subsystem for Linux however the majority of notable artifacts, particularly as they relate to user activity are to be found on disk.


Prefetch Files

As mentioned in my previous post, Bash itself, as it relates to ‘Bash on Ubuntu on Windows’, is an executable located at %systemroot%\System32\bash.exe. When a user selects the ‘Bash on Ubuntu on Windows' shortcut, either from the desktop, start menu or taskbar, bash.exe is executed with an argument of ‘~’ and the user is presented with a new window placing them within Bash at the home directory of the default user. This causes the bash.exe prefetch file to be updated, assuming prefetch is enabled. This may provide some useful information as to the last and recent run times associated with the executable, which will evidence the use of Bash.

With that said, Bash can be accessed in a number of ways including by executing the 'bash' command from within the cmd prompt or from a PowerShell prompt. Neither of these execution methods cause the bash.exe prefetch file to be updated. Running the bash command from the run dialog does update the associated prefetch file.

Another detail which is notable regarding executing the ‘bash’ command from a cmd or PowerShell prompt is that this will not place the user at their WSL home directory but rather at the location which the cmd/ PowerShell prompt was previously pointing.


WSL User Accounts/ Credentials

As you might expect the passwd and shadow files will be your go to location for details of the users who are set up within a specific WSL install and their associated credentials. These are located at:

C:\Users\[Username]\AppData\Local\lxss\rootfs\etc\shadow

C:\Users\[Username]\AppData\Local\lxss\rootfs\etc\passwd

These files and their respective backups 'shadow-' and 'passwd-' can be reviewed for useful information regarding the users who have been configured within WSL. The password hashes for these accounts can be extracted from the shadow file for cracking, should this be useful in your case.


Bash History

When determining user activity on a Linux system, the bash history is often the first port of call. The .bash_history file maintains a record of a user's command history within Bash. WSL systems with Bash for Windows installed are no exception and the bash_history file for each user can be located at:

C:\Users\[Username]\AppData\Local\Lxss\home\[WSLusername]\.bash_history

and for root:

C:\Users\[username]\AppData\Local\lxss\root\.bash_history

Anyone familiar with investigating breaches of Linux systems will likely be aware that the .bash_history file is a fantastic asset during analysis, unfortunately so do the bad guys and it’s common for attempts to be made to clear it. Additionally, the Bash history is often a far from comprehensive log as it can fail to be populated under various circumstances; can behave strangely when multiple instances of bash are used simultaneously; and by default, it lacks timestamps. The behaviour of bash history under various edge cases is an interesting topic in itself and indeed it is the topic of a great presentation by Hal Pomeranz titled 'You Don't Know Jack About bash_history' a recording of which is available here.

Key points to note when reviewing the bash history under WSL:


  • Bash history file can be deleted from within Bash
  • Bash history file can be deleted from Windows
  • By default, the bash history file is only populated when a shell exits
  • Bash history file may not be populated depending on how Bash exits


Unfortunately, by default .bash_history is only populated when Bash for Windows closes cleanly. If the task is killed from task manager, or if the user simply closes the window using the close button rather than typing 'exit' then the bash_history file is not updated with the commands from that session. Windows users are somewhat accustomed to closing windows down with the close button and so I expect that bash history will be less complete for WSL when compared to a regular Linux system.

If a suspect deletes the Bash history outright, either using Bash or via Windows, there is the potential to recover the deleted file. You are likely to be at the added advantage that the file will be deleted from an NTFS filesystem as opposed to those you commonly encounter when analysing Linux systems, often making recovery easier. Further benefits of this Windows/Linux hybrid environment you are analysing include the fact that a Volume Shadow Copy may contain historical copies of the .bash_history file.

An additional area for future research will be the recovery of bash history for live or ungracefully closed sessions from RAM captured from the Windows host. It’s a fairly niche case but conceivably, if you are able to capture a RAM image of a live system where an open Bash session exists the bash history (yet to be written to disk) may be recoverable from RAM. A debatable alternative is of course to gracefully close Bash prior to performing disk imaging, causing the .bash_history to be populated with the latest sessions data, but you know your case and jurisdictional restrictions better than I do so make sure you are on the right side of them.


Other Artifacts

It would be futile to attempt to detail and discuss all relevant artifacts within this post. Artifacts such as the files contained within '.ssh' or ‘.gnupg' or values in '.viminfo' (which may be relevant depending on your case) can be found where you would expect them. Various log files, with some notable exceptions (such as syslog) are to be found within the same location as you might expect to find them in an Ubuntu install.

One key area which has impact on analysis is the way WSL interacts with the host Windows file system and vice versa.


Filesystem Interaction (Accessing the Windows filesystem from WSL)

The first point to note about filesystem interaction is that the root filesystem for WSL exists within a directory on the local OS volume which will commonly be formatted with NTFS. This has significant implications in forensic analysis, including what metadata is stored for files and the likelihood of recovering deleted data. As alluded to earlier, the full directory structure associated with WSL is subject to Volume Shadow Copy and as such historical copies of files may be recoverable from that source, I have successfully used this to recover bash history that was otherwise unavailable.

Fixed storage devices associated with the host filesystem are presented to WSL as mounted devices at /mnt/[driveletter], and by default the 'noatime' attribute as set. e.g.:

Example content of /proc/mounts:
rootfs / lxfs rw,noatime 0 0
data /data lxfs rw,noatime 0 0
cache /cache lxfs rw,noatime 0 0
mnt /mnt lxfs rw,noatime 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,noatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,noatime 0 0
none /dev tmpfs rw,noatime,mode=755 0 0
devpts /dev/pts devpts rw,nosuid,noexec,noatime 0 0
none /run tmpfs rw,nosuid,noexec,noatime,mode=755 0 0
none /run/lock tmpfs rw,nosuid,nodev,noexec,noatime 0 0
none /run/shm tmpfs rw,nosuid,nodev,noatime 0 0
none /run/user tmpfs rw,nosuid,nodev,noexec,noatime,mode=755 0 0
C: /mnt/c drvfs rw,noatime 0 0
D: /mnt/d drvfs rw,noatime 0 0
root /root lxfs rw,noatime 0 0
home /home lxfs rw,noatime 0 0
binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,noatime 0 0

Prior to writing this post I was under the impression that to access a volume from within WSL it had to be formatted as NTFS or ReFS. However, per this MSDN blog from April, it appears support for additional filesystems has been added, although the requisite update is only available via the Windows 10 Insider Preview at the moment. In any event, I will focus on NTFS as in the majority of cases this is likely the file system onto which WSL will have been installed.

The way Microsoft have implemented the filesystem, so as to emulate Linux behaviour while the data actually resides upon an NTFS or ReFS volume is an interesting topic, and one which is covered well in an MSDN blog post here. To quote and paraphrase: "WSL provides access to Windows files by emulating full Linux behavior for the internal Linux file system with VolFs, and by providing full access to Windows drives and files through DrvFs. As of this writing, DrvFs enables some of the functionality of Linux file systems, such as case sensitivity and symbolic links, while still supporting interoperability with Windows". 

"When opening a file in DrvFs, Windows permissions are used based on the token of the user that executed bash.exe. So in order to access files under C:\Windows, it’s not enough to use “sudo” in your bash environment, which gives you root privileges in WSL but does not alter your Windows user token. Instead, you would have to launch bash.exe elevated to gain the appropriate permissions.


"VolFs is used to mount the VFS root directory, using %LocalAppData%\lxss\rootfs as the backing storage. In addition, a few additional VolFs mount points exist, most notably /root and /home which are mounted using %LocalAppData%\lxss\root and %LocalAppData%\lxss\home respectively. The reason for these separate mounts is that when you uninstall WSL, the home directories are not removed by default, so any personal files stored there will be preserved."


The architecture of VFS within WSL as used to facilitate interoperability is conveyed in the below diagram which is taken from the same blog post:
https://blogs.msdn.microsoft.com/wsl/2016/06/15/wsl-file-system-support/

The really useful MSDN blog posts by Jack Hammons as they relate to WSL should be considered compulsory reading if you find yourself performing a complex analysis of WSL.

This approach means that the full host file system (assuming fixed NTFS/ ReFS volumes) is accessible from within WSL. Linux tools can be used to view, modify and delete data/files on the fixed disks without leaving the traces we might expect to find were a user to perform similar actions via Windows Explorer. The facility to view, and modify data without leaving traditional traces may be attractive to malicious users. Additionally, commands such as 'touch' and 'shred', which are available by default, may be tempting to the savvy WSL user who is looking to perform antiforensics.

The fact that the WSL files exist within an NTFS filename not only impacts the likelihood of recovery but it also creates some interesting scenarios. One such example is that it is possible to create files using Linux's case sensitivity which have filenames which are identical, other than their case. Per the below screenshot, it is possible to use WSL to create files which explorer would not normally permit:

Three files created using Windows Subsystem for Linux

While this is possible via WSL, various windows applications aren't going to like it. Windows can in fact support case sensitivity, it is supported in NTFS and can be switched on by modifying the HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\kernel\ dword:ObCaseInsensitive registry value. But by default, this support isn't enabled and irrespective of this WSL will allow case sensitive filenames to be created as per the screenshot. Attempting to copy the files to another location causes errors.

This hybrid filesystem approach also has an interesting (and forensically useful) implication for file metadata. Recording file creation (or birth) is not supported by many common Linux file systems and indeed many Linux applications lack support for reading this attribute despite support being introduced in some modern file systems such as ext4. However, the WSL file system exists within an NTFS volume and despite the fact that it cannot be queried or modified from within WSL, a file created timestamp is recorded within the MFT for files created using WSL. This could prove forensically useful in all manner of cases.

Filesystem Interaction (Accessing the WSL filesystem from Windows)

As detailed above, VolFS and DrvFS are employed to mount different parts of the WSL file system and this has implications on file behaviour between these two locations. Specifically, VolFs is used to mount the VFS root directory, /root and /home while DrvFS is used for any local fixed drives as well as removable media and network shares with the addition of support for these. 

Focusing on the core WSL file structure as located within %LocalAppData%\lxss, direct manipulation (from Windows) of data within that directory structure will not necessarily be reflected within WSL. Windows does not have a concept of inodes and as such VolFs (in an effort to provide support for most VFS features) is required to maintain a separate record of inodes and their associated Windows file objects. If a Windows application (e.g. Explorer or Notepad) are used to create a file in a WSL home directory then that file will not be visible to the WSL user because VolFs was not employed in the creation of the file so certain attributes are missing, causing VolFS to ignore it when access is attempted from witin WSL.

The same occurs with file modification, if a file within a VolFs mounted file system is modified from Windows the MFT metadata will be updated to reflect the new last modified time, however the stat command within WSL will show the original values. Viewing the contents of the file MAY show the correct updated content as the file is accessed on disk (or you may experience corruption/ the files may disappear from view). The reason for this is detailed by Microsoft as such:
"While VolFs files are stored in regular files on Windows in the directories mentioned above, interoperability with Windows is not supported. If a new file is added to one of these directories from Windows, it lacks the EAs needed by VolFs, so VolFs doesn’t know what to do with the file and simply ignores it. Many editors will also strip the EAs when saving an existing file, again making the file unusable in WSL. 
Additionally, since VFS caches directory entries, any modifications to those directories that are made from Windows while WSL is running may not be accurately reflected."
The various attributes which are not natively supported by NTFS but are required by VolFs to best mimic Linux file system behavior are stored in NTFS Extended Attributes associated with the associated file. I'm pleased to report that the previously mentioned Microsoft MSDN blogs go into reasonable detail on this, going so far as to detail which information is stored in EAs, reproduced below:
  • "Mode: this includes the file type (regular, symlink, FIFO, etc.) and the permission bits for the file.
  • Owner: the user ID and group ID of the Linux user and group that own the file.
  • Device ID: for device files, the device major and minor number of the device. Note that WSL currently does not allow users to create device files on VolFs.
  • File times: the file accessed, modified and changed times on Linux use a different format and granularity than on Windows, so these are also stored in the EAs
  • In addition, if a file has any file capabilities, these are stored in an alternate data stream for the file. Note that WSL currently does not allow users to modify file capabilities for a file.The remaining inode attributes, such as inode number and file size, are derived from information kept by NTFS."
There is therefore the potential to examine the Extended Attributes to determine the Linux metadata associated with individual files. I say the potential, because fully documenting the way Extended Attributes are used by the Windows Subsystem for Linux is beyond the scope of this post. A screenshot of an example file is provided below:


Historical Windows Subsystem for Linux data 

One final takeaway from my various reading was the obviously conscious decision by the WSL development team to separate the rootfs, root and home filesystem and mount them independently. This is evidently done to facilitate the preservation of user data if/when WSL is uninstalled. This preservation is the default behaviour and as such when analysing a system which has had WSL disabled/ uninstalled there are likely to be potential sources of evidence in the home directories which have not been removed (assuming the user hasn't gone out of their way to remove them).

There is a shedload of useful information on the MSDN blog regarding WSL. Much of which I only discovered after hours of tinkering, but this link blogs.msdn.microsoft.com/wsl will be invaluable if you find yourself having to analyse a system where WSL was used by a suspect.