2017-08-29

Forensic Analysis of volumes with Data Deduplication Enabled

In this post I hope to outline a number of options available to analysts when performing a forensic analysis of Microsoft Windows Servers that have the "Data Deduplication" feature enabled. If you are unfamiliar with the Data Deduplication feature and want additional detail on it and it's impact on forensic investigations then see my last post.

The first half of this post addresses Acquiring Volumes which you know to contain Data Deduplication. However, if you already have a disk image and you are looking to extract the data after the fact, you can jump to the 'Handling images containing deduplicated data' section below.


Acquiring Volumes where you know Data Deduplication is enabled

If you happen to know in advance that the volume you wish to acquire has Data Deduplication enabled then you may be in a position to modify your acquisition approach.

As with any acquisition the most appropriate methodology is going to be dependent on the reason for the acquisition as well as any local policies or legal constraints. I would always suggest best practice dictates that a full physical disk image (or volume image) be your starting point, and the presence of Data Deduplication doesn't impact that statement.

Coming into this exercise I had a vague hope that performing a live acquisition of the logical volume may circumvent the issue of Data Deduplication, much like it circumvents the issues with many varieties of Full Disk Encryption (FDE). However, as soon as I started to look under the hood I immediately saw why that wasn't the case, not least of all because it is possible to store more deduplicated data within a volume than would fit were the data duplicated.

If you are presented with Data Deduplication then the most forensically sound method to acquire the data will be to perform a full disk image in the usual manner, or capture a copy of VMDKs/VHDs for virtual servers per your usual methodology, and then to extract/ deduplicate the data after the fact. The key caveat associated with this proposal is that it requires that you are in a position to spin up a Windows Server 2012/ 2016 box (with the associated licensing constraints which come with this). Alternatively, it may be possible to access the data after the if you are able to virtualise a copy of the server you have acquired. This may be possible via the use of tools such as Live View or Forensic Explorer.

If you aren't able to perform any of the above then I would still recommend capturing a full image as a starting point. While you might not be able to take the best practice approach at this stage you never know when this might change or who you may be required to provide evidence to and what their capabilities may be. Once you have secured your best evidence copy, it may be worth capturing key data via an active file collection method such as FTK Imagers "Contents of a Folder" option to AD1 or using EnCase to create an L01 while you have access to the original live system.


Unoptimization

Unoptimization is the process of reversing Data Deduplication for any particular volume and returning it to a duplicated state, rehydrating the individual files and replacing the reparse points with these files. This may seem like an appealing option when acquisition of a volume with Data Deduplication enabled is required, i.e. Unoptimize the volume prior to commencing the acquisition, but there are significant implications associated with this approach.

Firstly, it may not be possible to unoptimize a volume. Noting that the feature exists specifically to facilitate the storage of more data on a volume than could be stored in it’s original duplicated form, if the volume contains a significant quantity of duplicate data, or if it is nearing capacity it will not be possible to reverse the duplication due to a lack of space in the volume.

The implication of greatest concern to forensic practitioners is the fact that unoptimizing a volume will overwrite space that was otherwise unallocated, and may have contained useful deleted data. Nevertheless is may seem applealing to capture a forensic image of the volume to preserve that recoverable data, then to unoptimize the volume and capture a second image. Certainly, assuming the volume has adequate space to be unoptimized, that is an option which would satisfy the goal of getting access to all available data in a relatively convenient manner.

If you already have an image with optimization enabled and want to unoptimize the drive there are still further implications to be considered. These will be of varying significance depending on the system in question and the circumstances surrounding the requirement for the acquisition, but they include:

  • A significant length of time will be required to perform the acquisition. Capturing an image, duduplicating a potentially large volume and capturing a second image is going to take a long time. That’s more time at the controls, more time inconveniencing the system owner and more time for something to go wrong with a live, potentially business critical system.
  • The unoptimization process has a system impact. Usually optimisation happens on a few files at a time, on a scheduled basis and balanced against system load. If you are manually initiating a full unoptimization it will have a significant impact on IO and RAM usage. Depending on the function and load on the server this may not be acceptable.
  • Risk. Significant operations like disk imaging always carry some risk that an error occurs and system failure results. The same goes for unoptimization, as far as I am concerned it introduced unnecessary risk which is avoidable via the method outlined in this post. 

In case the above diatribe didn’t make it clear, I really do not recommend performing unoptimization on a source device to facilitate an acquisition, but I appreciate it may be the only option in some circumstances. Maybe.

With that said, reviewing the output from the PowerShell command Get-DedupStatus will provide an indication as to whether unoptimization is possible.




If the SavedSpace exceeds the Free Space then the drive cannot be fully Unoptimized. Assuming the volume can be Unoptimized then the following PowerShell command will initiate the process (replacing [volume] with the volume letter you wish to unoptimize:

Start-DedupJob [volume] -Type Unoptimization

This command will unoptimize the data on the volume and then disable Data Deduplication for that volume. The status of drive can then be checked with:

Get-DedupeStatus

If running the Get-DedupStatus returns to the command prompt withot other input, per the screenshot below, then there are no longer any optimized volumes.



Once the job has completed and the volume is no longer optimized, you can acquire it as you would any other disk/volume image. Thereafter, the appropriate command or steps to reoptimized the drive back to it’s original state will depend on how it was initially configured. You can defer to the system owner/ administrator for this information (and really I hope you engaged with them before you even considered unoptimizing volumes on their server).


Handling images containing deduplicated data

If your only option is to capture a dead disk image, or you have captured an image unaware that the volume had Data Deduplication enabled then you will have to extract the data after the fact. In most instances this would be my recommended approach anyway, i.e. capture a regular disk image and perform post processing later. As alluded to above, I have not identified a method to reverse data deduplication, or to extract the data, which does not require access to a system running Server 2012 or Server 2016.


Mounting within virtual or physical server

The first point to note is that Data Deduplication in Server 2012 and Server 2016 is not fully interchangeable. A deduplicted volume created in Server 2016 cannot be accessed by Server 2012. However, it does appear to be backwards compatible, drives created in Server 2012 can be read from Server 2016.

Whether you have a logical volume image, a physical disk image or a flat VMDK/VHD from ESXi or Hyper-V, my preferred method to access them is via the "Image Mounting" feature of FTK Imager. In the below instructions I will detail the process to access the drive from a Windows Server 2016 Standard install however the process for Server 2012 is essentially identical (assuming the drive you are attempting to access was created using Server 2012).


1. Install Server 2016
If you have a valid Server 2016 license or access to an MSDN subscription then you can acquire the ISO from those sources. I'll also point out that Microsoft make evaluation copies of both Server 2016 and 2012 available for 180 day evaluations. These can be accessed here, however whether this is a legitimate use per the evaluation license agreement is for you to decide or seek advice on, IANAL.

Whether you are using physical hardware or a Type 1/ Type 2 Hypervisor the install process is essentially the same. Install the OS and when prompted select Windows Server 2016 Standard (Desktop Experience), the 'Desktop Experience' is required to use FTK Imager to mount drives (unless I am mistaken, I don't believe this can be achieved on the command line) and makes testing files easier. If you find you have accidentally installed Server 2016 core (without the GUI), as *may* have happened to me more than once during testing then the below instructions were helpful:

Installing the GUI for Server 2016
Installing the GUI for Server 2012

2. Mount the volume containing deduplicated data using FTK Imager
Once you have logged into your newly created Server 2016 analysis machine you can then mount the drive in the manner of your choosing  I will detail the method using FTK Imager lite however Mount Image Pro or OSFMount will work equally well.

FTK Imager Lite is sufficient for our purposes, and can be downloaded from the Access Data website here.

Once launched the Mount Image to Drive window, can be accessed via selecting File > Image Mounting or via selecting the Image Mounting icon icon in the Toolbar.


FTK Imager Image Mounting Icon

Populate the "Image File:" field with path to your Image/VMDK/VHD file, or browse to where it is located by clicking the button labelled '...'. Whether you are mounting a full physical image, a logical volume image or a VHD etc, the remaining fields will be automatically populated. In this instance the default settings will suffice so you can go ahead and select mount.


Mount Image to Drive window

The Mapped Image list will now populate with the drive letters assigned to the partition(s) within the image file. Take a note of the drive letter which has been assigned to the volume you are interested and select Close.

Image Mounted with FTK Imager Lite

Within Windows explorer you will now be able to browse the mounted image however deduplicated data will still be inaccessible, this is because Data Deduplication is not installed by default for our fresh install of Server 2012 or 2016.


3. Install Data Deduplication
The Data Deduplication Feature can be installed via PowerShell with ease, however GUI instructions follow if preferred.

To install the feature, launch a PowerShell console and type:

Install-WindowsFeature -Name FS-Data-Deduplication

Hit Enter to execute the command and the Data Deduplication feature will now install, no restart will be required. Thereafter you can revisit the mounted volume and you will note that the previously inaccessible files can now be accessed.

If you prefer to do this without having to deal with PowerShell this can be achieved using the Add Roles and Features Wizard.

Launch Server Manager, select Local Server and scroll down until you see Roles and Features:


Roles and Features within Server Manager

Clicking the Tasks button will bring up a dropdown menu where you can select Add Roles and Features:


Add Roles and Features in Tasks drop down

Select Next to skip past 'Before you Begin' , then again to retain the default installation type as "Role-Based or feature-based installation" and a third time to keep the default server selection as your new local server.

Under Server Roles > File and Storage Services > File and iSCSI Services you will find the checkbox for Data Deduplication

Select the Data Deduplication checkbox to add the feature and and you will receive a notification, per the below screenshot, that the 'File Server' feature requires installing as it is a dependency:


Add Roles and Features Wizard warning

Select Add Features

Select Next

Select Install

The Data Deduplication feature will now install and no restart will be required. Thereafter you can revisit the mounted volume and you will note that the previously inaccessible files can now be accessed.

4. Acquire data
If you do not need to acquire the data but rather perform processing and parsing activities you can stop here. The mounted drive can be used for AV scans, Bulk Extractor runs etc, as long as the process you wish to undertake performs Physical Name Querying to access the active files.

If however you need to acquire copies of the files for analysis in forensic tools, or to process by some other means then you may wish to capture an logical acquisition of the files themselves into an AD1 or similar. Once again FTK Imager Lite is sufficient for our purposes, and the steps are outlined below:

If FTK Imager was used to mount the drive then it should still be running. If not then launch FTK Imager.

Once launched the process to acquire the active files to a logical container is to select File > Image Mounting or via selecting the 'Create Disk Image' icon in the Toolbar.


Create Disk Image Icon


Select the Contents of a Folder evidence type. You will likely get a Warning message, which you can ignore and select Yes to continue.

Select Browse and navigate tot he root of the mounted drive, remembering the drive letter as assigned when we mounted the volume. Per the below screenshot this is 'H:\'. Note: If you only wish to perform a targeted collection of a limited number of files a more direct path can be entered or browsed to.



Select Finish

Within the 'Create Image' window, select Add...

Populate the case information as you require and select Next >

This will bring you to the 'Select Image Destination' Dialog box where you can Enter or Browse to a destination path for your AD1 (USB media or network location recommended).

The other settings can be left at default, select Finish and you will return to the Create Image Dialog window.

Select Start to initiate the logical acquisition. Once complete you will have an AD1 container which contained logical copies of the active files with their metadata preserved. Make sure to check the log for errors. If you attempt to perform a logical acquisition without enabling data deduplication or from a system which does not support dedupe the job will fail and the error log will look as below, showing that the files could not be accessed:


FTK Image Summary with errors

Other Options

The above methodology is my preferred approach when accessing volumes with Data Deduplication enabled, however minor modifications to the methodology may suit others better. Some modifications to consider include:

Presenting the VMDK or VHD to a Virtual server as a read only device.
If your source evidence is provided as a virtual disk then an alternative to mounting the image with software is to attach the disk to your virtual Server 2016/2012 instance via the Hypervisor. This may provide a performance improvement but is slightly more involved than using FTK Imager (or similar). There are also risks of accidentally adding the device as Read/Write so this should never be performed on original evidence.

Virtualise the source server
Personally, I have limited experience in visualizing images for review, it used to be an occasional requirement when attempting to access proprietary finance systems for insolvency matters. The last time I did this was some years ago when Live View was king and no doubt things have moved on. Some forensics shops appear to use image virtualization heavily and this technique may be useful in handling data deduplication where access to a legitimate Windows Server 2012/2016 license is not feasible. Booting a copy of the original imaged server and then using that to review or acquiring a logical copy of the data contained in a data deduplicated volume should work in theory.

Hopefully the above is useful to anyone who finds themselves up against the same issue. If you get this far and it's been helpful please let me know by leaving a comment or fire me a tweet. Comments, criticism and corrections all welcome too.

@harrisonamj

2017-08-25

Windows Server Data Dedupliction and Forensic Analysis

In starting to write on this topic I had intended to share a concise guide on my recommended methodology for handling disk images which contain Windows Server Data Deduplication, however as I delved a little deeper into research my inaugural post started to get a little unwieldy. I have split the original post into two parts, this first post serves as an introduction to Data Deduplication and speaks to how to identify whether a system or disk image has Data Deduplication enabled. If you already have a disk image and want to jump straight into how to extract the deduplicated data please head over to the second post in this series, Forensic Analysis of Volumes with Data Deduplication.

What is Data Deduplication

The Data Deduplication ('Dedupe') I am referring to in this post is specifically the Microsoft Server feature, while other similar technologies exist in other operating systems and for different filesystems these are not addressed here. Since Windows Server 2012 Microsoft has provided system administrators the option to enable Data Deduplication, which to paraphrase their description, is designed to assist in the reduction of costs associated with duplicated data. When enabled, "duplicate portions of the volume's dataset are stored once and are (optionally) compressed for additional savings". 

The Dedupe feature shouldn't be confused with the Single-Instance Storage ('SIS') feature which was made available in Windows Storage Server 2008 R2 and could be found in Exchange Server previous to then; I may address SIS in a subsequent post.

There are a multitude of Microsoft and Third-Party resources which describe the functionality of Dedupe, a few good primers are available here (Server 2012), here (Server 2016) and here (Server 2012, 2016 and other deduplication technologies).

The key takeaways as they impact forensic analysis are as follows:
  • Data Duplication cannot be enabled on the system or boot volumes and as such, in many investigations - particularly IR, the presence of Dedupe may not impact much of the analysis.
  • By default Data Duplication will not impact certain filetypes and additional custom exclusions can be added/defined when the administrator enables the feature.
  • It works below file level, deduplicating chunks of files rather than whole files and unique chunks are stored in the Chunk Store.
  • Chunks stored in the Chunk Store can be compressed, again there is a default and customisable exclusion list. This will impact keyword searching and file examination in tools which are not Dedupe-aware (SPOILER ALERT: None of the major forensic tools are currently capable of processing Data Deduplication natively).
  • Original file streams are replaced with a reparse points to the chunk store. Notably, these files are initially written to disk as per usual and then deduplicated on a scheduled basis with the original files being deleted, this introduces the possibility that the original files may be recoverable via traditional carving.  

The Problem with Data Deduplication

While Dedupe has been available since Windows Server 2012, I was essentially unaware of it until mid-2017 when a colleague encountered it during a case which revolved around multiple Windows Server 2016 systems, many of which had the feature enabled. When conducting AV scans of the imaged physical disks a huge number of errors occurred relating to files being inaccessible. When these errors were explored it was identified that the files at each location were not in fact files, but reparse points. 

The same issue arises when attempting to analyse a disk image which contains deduplicated data in forensic tools, carve data or keyword search. At the time of writing, and to the best of my knowledge, none of the major forensic suites recognise or handle Dedupe, likewise if one were to simply connect a drive to a computer system running a different operating system, or even the same operating system without the feature enabled the files would be unreadable.

My default position when capturing images of physical systems is to take a full physical disk image, and possibly a secondary logical image of the key volumes if the system is live and known to have Full Disk Encryption. Likewise, when acquiring virtual servers, it is often easier and more forensically sound to be provided with copies of full flat VMDKs or VHDs than to perform live imaging within the virtual environment. As such, it is easy to come away with images which contain deduplicated data without realising it, especially if you don't explicitly check or ask the system owner at the time of acquisition.

How to check for Data Deduplication

If you approach a live system and you need to determine whether data deduplication is enabled on any of its volumes you can achieve this in a number of ways. The simplest way is to review the configuration within Server Manager.

In Server Manager, under File and Storage Services, and Volumes you can review the available volumes. Per the below screenshot any volume with active data deduplication will display a non-zero 'Deduplication Rate' and 'Deduplication Saving', as per the below screenshot:


Server Manager showing E: volume with a Deduplication Rate of 37% and Saving of 104MB
If Data Deduplication is either not enabled or not installed then the same columns appear, however they will contain no values:
Server Manager showing no volumes with a listed Deduplication Rate or Savings
We can also check for the presence of dedupe via PowerShell. If we attempt to execute the PowerShell command 'Get-DedupeStatus' the output will tell us whether Dedupe is enabled on any volumes. 

In the event that the Dedupe feature is not installed you will receive an error indicating that "the term 'Get-DedupStatus' is not recognised as the name of a cmdlet, function, script file, or operable program". This is because PowerShell cmdlets associated with optional features are only installed when those optional features are installed (in most cases). An example of the error you can expect is provided in the screenshot below:

PowerShell showing error following execution of Get-DedupeStatus command

If the data deduplication feature is installed, but not currently enabled on any volumes, you will return to the prompt without any further output (as below):
Output from PowerShell command Get-DedupeStatus where Data Deduplication is not not currently enabled on any volumes 
If it is installed and enabled you will receive output as below, this screenshot indicates that it is enabled on the E: volume:

Output from PowerShell command Get-DedupeStatus where Data Deduplication is enabled.
Granted, relying upon the presence or absence of errors to confirm the system configuration is a little inelegant. If you wish to double check whether Data Deduplication feature is currently installed, this can be confirmed with:

Get-WindowsFeature -Name *Dedup* 

This command will return all Windows Features for that OS containing "dedup" in their Name, such as ‘FS-Data-Deduplication’, and indicate their status. In the below screenshots the absence of an 'X' inside the square brackets and the 'Install State' of 'Available' confirm the feature is not currently installed and the presence of an 'X' and an 'Install State' of 'Installed'... should be self-explanatory.
Get-WindowsFeature -Name *Dedup* output where Data Deduplication is not enabled
Get-WindowsFeature -Name *Dedup* output where Data Deduplication is not enabled
The same can be confirmed within the warmth and comfort of a GUI by reviewing the configuration in the Add Roles and Features Wizard. From the Add Roles and Features Wizard, under Server Roles, File and Storage Services, File and iSCSI Services you will find the checkbox for Data Deduplication. If its Checked, it's installed. 

Checking a Disk Image for Data Deduplication

But what if you didn't check in advance for data deduplication and you already have an image which you believe may contain deduplicated data. Fear not, a brief examination of the image will allow you to determine whether dedupe was enabled.

Most obviously, and the way many will stumble across the issue is through examining the data files contained on the drive. Unfortunately, it is easy to miss unless you actually review individual files.

Autopsy, EnCase, FTK, X-Ways and indeed Windows Explorer do little to warn you in advance that the data presented is not as it seems. Within Autopsy the default table view does little to indicate that the files are anything other than regular files. The filename, extension and metadata associated with the file are displayed as consistent with the true underlying file, and indeed the values at they appear in the MFT, but when a file is selected the Hex view would seem to indicate that the file at that location is zero filled:

Viewing a Reparse Point associated with a Deduplicated file in Autopsy 4.4.0

A review of the 'File Metadata' tab does highlight that the deduplicated files have the SI Attribute 'Reparse Point' flag set, but it isn't immediately obvious when taking a cursory review.
Viewing the File Metadata tab for a  Reparse Point in Autopsy 4.4.0
Things can become a little clearer depending on your processing options. If you enable the Extension Mismatch Detection processing option the files which have been deduplicated will be flagged as their content does not match their extension. Likewise, errors in processing compound files may serve as an additional warning.

EnCase is similar in that the files present as Zero-filled when viewed in hex view. Within Table Pane the tell-tale signs are the fact that EnCase does not specify a starting extent for the deduplicated files and notes them to be Symbolic Links (Sym Link) in the Description column. Additionally, depending on the processing options you enable, the reparse points will not have the Signature Analysis, File Type, MD5, SHA1 or Entropy columns populated while regular files will.

Viewing a Reparse Point associated with a Deduplicated file in Autopsy 7.10.05
Similar behaviour is observed within FTK, FTK Imager and X-Ways. Notably, FTK Imager, due to its limited data displayed in the default view actually makes the files stand out somewhat more as it assigns the value of 'Reparse Point' within the Type column as opposed to 'Regular File'.

Viewing a Reparse Point associated with a Deduplicated file in Autopsy 7.10.05
X-Ways Forensics will highlight the reparse points in blue when the ‘Verify file types with signatures and algorithms’ processing option is enabled as it highlights the fact that the file signature does not match the extension. 

Viewing a Reparse Point associated with a Deduplicated file in X-Ways Forensics 19.1
It is also worth noting that when an image or a drive containing Deduplicted data is presented to a Windows OS which is not "Dedupe Aware", and browsed in Windows Explorer there is nothing to indicate that anything is amiss until one attempts to access the data. The below screenshot is of a VMDK containing Deduplicated data presented to Windows 10 using FTK Imager’s ‘Mount Image’ function.

Browsing a volume with Data Deduplication mounted within an OS which is not 'Dedupe aware'.
Only once you try to access the files do you start to receive errors, the specific error will depend on the application associated with each filetype, some examples are provided below:    

Various application errors associated with attempting to open a deduplicated file from an OS which is not 'Dedupe aware'.
When performing a check of files in this manner in an attempt to identify whether data deduplication is in use the following things should be noted:
  • Even when Data Deduplication is enabled files under a certain size will not be deduplicated, specifically files under 32kb. 
  • Certain filetypes can be excluded from deduplication, and file exclusions are configurable by the system administrator. 
  • It is possible for a volume to have a mix of deduplicated and duplicated data of the same filetype and size. The process of deduplicating files is referred to as 'optimizing' in Microsoft vernacular and by default files are only 'optimized' after they are a certain number of days old on disk. By default this is 3 days, but it is configurable.
  • It is further possible for a volume to have a mix of deduplicated and duplicated data as a result of failed or incomplete unoptimization. Unoptimization is covered in more detail in the Unoptimization section below.
  • Deduplication can be enabled on some volumes and not others, as such just because it is not employed on one volume doesn't mean it isn't used on another
Evidently, while performing a spot check review of the data is certainly the quickest and dirtiest approach to identifying whether data deduplication is at play, it is hardly fool proof. In any event, we are forensicators and if we want to understand the configuration and settings of a Windows host there is nowhere we would rather be than knees deep in the registry. 

A simple regshot before and after configuration changes is commonly the quickest way to test for and identify useful forensic artifacts. This was employed to compare the registry between a fresh OS install, just after installation of the data deduplication feature and subsequently after enabling data deduplication for one volume in both Server 2012 and 2016, in an effort to identify the which would speak to whether deduplication was installed and enabled. 

The first point to note is that installation of the deduplication feature in Server 2012 resulted in the creation of 85525 registry keys, the addition of 322778 new values and the deletion of 2 values. Full disclosure: I have not reviewed all of these. With that said a cursory review identified a number of notable additions, specifically:

HKLM\SYSTEM\CurrentControlSet\Services\ddpsvc
HKLM\SYSTEM\CurrentControlSet\Services\ddpvssvc
HKLM\SYSTEM\CurrentControlSet\Services\Dedup

The presence of these Registry keys indicates that the 'Data Deduplication' feature has been installed. If the feature is subsequently uninstalled these keys are deleted. The fact that this feature has been installed doesn't of course mean that it has been enabled on any volume however it is a key indication that caution is advisable.

When Data Deduplication has been enabled for a volume a registry key is created at: 

HKLM\SYSTEM\CurrentControlSet\Services\ddpsvc\PerfTrace

This key is populated with subkeys related to the volumes for which Dedupe has been enabled e.g. when the E volume had it enabled a subkey is created at:

HKLM\SYSTEM\CurrentControlSet\Services\ddpsvc\PerfTrace\E__volume

While there are technical differences between the implementation of Data Deduplication between Server 2012 and Server 2016, the first obvious difference during analysis comes when examining the registry. When data deduplication is enabled in Server 2012 a single subkey is created for each volume, as per the screenshot below:

Windows Server 2012 registry after Data Deduplication is enabled on the E Volume
When the same change is enabled within Server 2016 we see two subkeys created for each volume which has Data Deduplication enabled, the same '[VOLUME_LETTER]__volume' key and a further ____[volume_letter]___ key:

Windows Server 2012 registry after Data Deduplication is enabled on the E Volume
The presence of these Registry keys indicates that the Data Deduplication feature has been enabled for the volumes which are referenced in the key name. It should be noted however, that if Data Deduplication is then disabled and the volumes 'Unoptimized', the keys persist. Further analysis is required to identify a reliable artifact which can be examined to determine whether and where Dedupe is enabled.

Credit must go to the to the venerable Charlotte Hammond (@gh0stp0pwho took one look at this question and suggested “it might be worth looking at the file paths and location of the ChunkStore and other related directories”, and as per usual she was right. 

When Data Deduplication is enabled for a volume, a ‘Dedup’ directory is created under the System Volume Information. This directory contains lots of useful information including the Chunk Store (evidence of which is enough to indicate that Data Decryption is enabled for the volume). When data deduplication is disabled for a volume the entire 'Dedup' directory and contents are deleted, per the below screenshot:


As such, if you find an active 'Dedup' directory within the SVI on any volume this is an indication that the Volume likely has Data Deduplication enabled. What's more, the 'Dedup' folder contains a wealth of interesting information. It was beyond the scope of this research to delve into the full contents in depth, however a few notable observations and artefacts are detailed below. 

The Dedup directory contains the chunk store itself, log data as well as notable configuration and state files such as [root]\System Volume Information\Dedup\Settings\dedupConfig.01.xml, a screenshot of which is provided below:
dedupConfig.01.xml
dedupConfig.01.xml appears to contain Data Deduplication configuration information specific to that volume. The properties appear to be fairly self-explanatory however I did not perform testing to confirm how the property values tie to configuration changes.

Further notable files were identified within ‘[root]\System Volume Information\Dedup\State’ specifically:
chunkStoreStatistics.xml
dedupStatistics.xml
optimizationState.xml
unoptimizationState.xml

chunkStoreStatistics.xml
chunkStoreStatistics.xml
chunkStoreStatistics.xml appears to contain information regarding , amongst other things, the chunk store size.

dedupStatistics.xml
dedupStatistics.xml
dedupStatistics.xml in a dramatic turn of events appears to contain statistics relating to Dedup.

optimizationState.xml
optimizationState.xml
unoptimizationState.xml ( Note:the volume examined had been optimised and unoptimized multiple times)
unoptimizationState.xml

Unoptimization

Once a volume has been optimized (deduplicated) it can of course be unoptimized (why Microsoft didn’t settle on dededuplicated as a term of reference I will never understand). In simple terms, assuming there is adequate space on a volume the process of deduplication can be reversed and the reparse points will be replaced with full rehydrated files.

In a limited set of circumstances, it may be conceivable that unoptimizing a volume prior to acquisition may be a solution to afford access to the data, however in a live environment or business critical system this process will create a performance hit. 

There are a myriad of reasons I would not advise doing this in the context of any investigation including but not limited to the loss of recoverable deleted data as additional freespace is overwritten and the fact that the methodology is not defensible when alternatives exist. This is covered in more detail within my follow up blog post here.

With that said, a volume can be Unoptimized with the following PowerShell command:

Start-DedupJob [volume] -Type Unoptimization

This command will unoptimize the data on the volume and then disable Data Deduplication for that volume

One issue with unoptimization is that it is possible that a volume may contain more deduplicated data than could be stored in a duplicated state. This will be especially likely in volumes with limited free space or a high level of duplication. During testing, I filled a volume with duplicated data, enabled Data Deduplication and then added more of the same data. I anticipated that when unoptimization was attempted it would error out either before starting or after it ran out of space.

Much to my surprise, the Unoptimization job completed seemingly without error. When I then reviewed the drive in Windows Explorer I found it to be completely full:

No more room for datas
Furthermore, the result of the Get-DedupStatus command indicated that a reduced number of files were now considered “InPolicy” and that they were still optimized:

Output from PowerShell command Get-DedupStatus following a partial/failed unoptimization

When I then viewed the volume in EnCase I was able to confirm that some of the reparse points had now been replaced with rehydrated files and some still existed. All data was still accessible via Windows Explorer in the Host OS.

The key takeaway from this is that an unoptimize job can and will fail to complete if it runs out of space within the volume, however it does not necessarily alert the user to this fact. A review of the Get-DedupStatus output following an Unoptimize job will confirm whether Data Deduplication is still enabled.

Additionally, reviewing the Get-DedupStatus output for any volume you plan to Unoptimize prior to attempt it is advisable. If the SavedSpace exceeds the Free Space then the drive cannot be fully Unoptimized.

Other supported Operating Systems

While not a supported feature it should also be noted that users have identified methods to enable the Deduplicate Data feature within Desktop OSs, including Windows 8, 8.1 and 10. It's probably unlikely you will stumble across any such system during your day to day investigations however it is worth being aware of. 

Details of the methodology employed for are available at Teh Wei King's blog for both Windows 8 and Windows 8.1. Some details regarding enabling this feature within Windows 10 is available here.

Forensic Acquisition of volume with Data Deduplication

Hopefully this post has provided some useful information regarding the Data Deduplication feature and provided resources for further reading where required. The second post in this series will deal with methods to extract deduplicated data, that post can be found here.

Commands used in testing

If you wish to validate any of the performed steps and statements, the following PowerShell commands were key in setting up my test environment:

Install-WindowsFeature -Name FS-Data-Deduplication
Installs the Data Deduplication Feature in Windows Server 2012/2016

Enable-DedupVolume -Volume e: -UsageType Default
Enables Data Deduplication for volume e: with default settings

Set-Dedupvolume E: -MinimumFileAgeDays 0
Reduces the Minimum File Age requirement to 0 days from default value of 3 (useful for testing)

Start-DedupJob E: -Type Optimization
Initiates a manual Optimization job rather than waiting for a scheduled job

Start-DedupJob E: -Type Unoptimization
Command to Unoptimize a volume

Hopefully the above is useful to anyone who finds themselves up against the same issue. If you get this far and it's been helpful please let me know by leaving a comment or fire me a tweet. Comments, criticism and corrections all welcome too.

@harrisonamj