Thursday, April 7, 2016

Daily Blog #367: Automating DFIR with dfVFS part 1

Hello Reader,
         Today we begin again with a new Automating DFIR series.

If you want to show your support for my efforts, there is an easy way to do that. 

Vote for me for Digital Forensic Investigator of the Year here:

The last time we started this series (you can read that here we were using the pytsk library primarily to access images and live systems. This time around we are going to restart the series to the first steps to show how to do this with the dfVFS library which makes use of pytsk and many, many other libraries.

         In comparison to dfVFS pytsk is a pretty simple and straightforward library, but it does have its limitations. dfVFS (Digital Forensics Virtual Filesystem) is not just one library, its a collection of DFIR libraries with all the glue in between so things work together without reinventing the wheel/processor/image format again. This post will start with opening a forensic image and printing the partition table much like we did in part 1 of the original Automating DFIR with pytsk series. What is different is that this time our code will work with E01s, S01s, AFF and other image formats without us having to write additional code for it. This is because dfVFS has all sorts of helper functions built in to determine the image format and load the right library for you to access the underlying data.

Before you get started on this series make sure you have the python 2.7 x86 installed and have followed the steps in the following updated blog post about how to get dfVFS setup:

You'll also want to download our first forensic image we are working with located here:!ShhFSLjY!RawTMjJoR6mJgn4P0sQAdzU5XOedR6ianFRcY_xxvwY

When I got my new machine setup I realized that a couple new libraries were not included in the original post so I updated it. If you followed the post to get your environment setup before yesterday you should check the list of modules to make sure you have them all installed. Second on my system I had an interesting issue where the libcrypto library was being installed as crypto but dfVFS was calling it as Crypto (case matters). I had to rename the directory under \python27\lib\site-packages\crypto to Crypto and then everything worked.

If you want to make sure everything works then download the full dfvfs package from github (linked in the installing dfvfs post) and run the tests before proceeding any further.

Let's start with what the code looks like:
import sys
import logging

from dfvfs.analyzer import analyzer
from dfvfs.lib import definitions
from dfvfs.path import factory as path_spec_factory
from dfvfs.volume import tsk_volume_system


path_spec = path_spec_factory.Factory.NewPathSpec(
          definitions.TYPE_INDICATOR_OS, location=source_path)

type_indicators = analyzer.Analyzer.GetStorageMediaImageTypeIndicators(

source_path_spec = path_spec_factory.Factory.NewPathSpec(
            type_indicators[0], parent=path_spec)

volume_system_path_spec = path_spec_factory.Factory.NewPathSpec(
        definitions.TYPE_INDICATOR_TSK_PARTITION, location=u'/',

volume_system = tsk_volume_system.TSKVolumeSystem()

volume_identifiers = []
for volume in volume_system.volumes:
  volume_identifier = getattr(volume, 'identifier', None)
  if volume_identifier:

print(u'The following partitions were found:')

for volume_identifier in sorted(volume_identifiers):
  volume = volume_system.GetVolumeByIdentifier(volume_identifier)
  if not volume:
    raise RuntimeError(
        u'Volume missing for identifier: {0:s}.'.format(volume_identifier))

  volume_extent = volume.extents[0]
      u'{0:s}\t\t{1:d} (0x{1:08x})\t{2:d}'.format(
          volume.identifier, volume_extent.offset, volume_extent.size))


as you can tell this is much larger than our first code example from the pytsk series which was this:
import sys
import pytsk3
imagefile = "Stage2.vhd"
imagehandle = pytsk3.Img_Info(imagefile)
partitionTable = pytsk3.Volume_Info(imagehandle)
for partition in partitionTable:
print partition.addr, partition.desc, "%ss(%s)" % (partition.start, partition.start * 512), partition.len

But the easier to read and smaller pytsk example is much more limited in functionality to what the dfVFS version can do. On its own our pytsk example could only work with raw images, our dfVFS example can work with multiple image types and already has built in support for multipart images and shadow copies!

Let's break down the code:
import sysimport logging

Here we are just importing in two standard python libraries. Sys for default python system library and logging which gives a mechanism to standardized our logging of errors and information messages that we can tweak so we can give different levels of information based on what level of logging is being requested.

Next we are bringing in multiple dfVFS functions:

from dfvfs.analyzer import analyzer

We are bringing in 4 helper functions here from dfVFS. First we are bringing in the 
analyzer function which can determine for us the type of image or archive or partition we 
are attempting to access, in its current version it can auto detect the following:

  • bde - bitlockered volumes
  • bzip2 - bzip2 archives
  • cpio - cpio archives
  • ewf - expert witness format aka e01 images
  • gzip - gzip archives
  • lvm - logical volume management, the linux partitioning system
  • ntfs - ntfs partitions
  • qcow - qcow images
  • tar - tar archives
  • tsk - tsk supported image types
  • tak partition - tsk identified partitions
  • vhd - vhd virtual drives
  • vmdk - vmware virtual drives
  • shadow volume - windows shadow volumes
  • zip - zip archives
from dfvfs.lib import definitions
The next helper function is the definitions function which maps our named types and 
identifiers to the values that the underlying libraries are expecting or returning. 

from dfvfs.path import factory as path_spec_factory
The path_spec_factory helper library is one of the cornerstones of understanding the 
dfVFS framework. path_spec's are what you pass into most of the dfvfs functions and contain 
the type of object you are passing in (from the definitions helper library), the location
where this thing is (either on the file system you are running the script from or the 
location within the image you are pointing to) and the parent path if there is one. As we
go through this code you'll notice we make multiple path_spec's as we work to make the 
object we need to pass to the right helper function to access the forensic image. 

         from dfvfs.volume import tsk_volume_system

This helper library is create a pytsk volume object for us to allow us to use pytsk to enumerate and access volumes/partitions. 


Here we are creating a variable called source_path and storing within it the name of the forensic
image we would like to work with. In future examples we will work with other images types and
sizes but this is a small and simple image. I've tested this script with vhds and e01s and both opened
without issue and without changing any code other than the name of the file.

path_spec = path_spec_factory.Factory.NewPathSpec(
definitions.TYPE_INDICATOR_OS, location=source_path)

Our first path_spec object. Here we are calling the path_spec_factory helper function and using the 
NewPathSpec function to return a path spec object. We are passing in the type of file we are working
with which is TYPE_INDICATOR_OS which is defined in the dfVFS wiki as a file contained within
the operating system and where the file is located as the location by passing in the source_path 
variable we made in the line above. 

type_indicators = analyzer.Analyzer.GetStorageMediaImageTypeIndicators(

Next we are letting the Analyzer helper function's Get Storage Media Image Type Indicator 
function to let it figure out what kind of image, file system, partition or archive 
we are dealing with by its signature. It will return back the type into a variable called
type indicators.

source_path_spec = path_spec_factory.Factory.NewPathSpec( type_indicators[0], parent=path_spec)

Once we have the type of thing we are working with we want to generate another path_spec
object that has that information within it so our next helper library knows what it is
dealing with. We do that by calling the same NewPathSpec function but now for a type
we are passing in the first result that was stored in the type_indicator. Now I am cheating
a bit here to make things simple, I should be checking to see how many types are being
returned and if we know what we are dealing with. However that won't make this program
any easier to read and I'm giving you an image that will work correctly with this code.
In future blog posts we will put in the logic to detect and report such errors.

volume_system_path_spec = path_spec_factory.Factory.NewPathSpec(
definitions.TYPE_INDICATOR_TSK_PARTITION, location=u'/',

Another path_spec object! Now that we have a path_spec object that identifies its type as
a forensic image we can create a path_spec object for the partitions contained within it.
You can see we are passing in the type of TSK_PARTITION meaning this is a tsk object 
that will work with the tsk volume functions. Again in future posts we will write code to
determine if there are in fact partitions here to deal with, ,but for now just know it 

volume_system = tsk_volume_system.TSKVolumeSystem()volume_system.Open(volume_system_path_spec)

Now we are going back to our old buddy libtsk aka pytsk. We are creating a TSK Volume
object and storing it in volume_system. Then we are opening the path_spec object we just
made that contains a valid tsk object and passing that into our new tsk volume object and
telling it to open it.

volume_identifiers = []
for volume in volume_system.volumes:
volume_identifier = getattr(volume, 'identifier', None)
if volume_identifier:
Here we are initializing a list object called volume_identifiers and then making use of the volumes
function within the tsk volume_system object to return a list of volumes aka partitions stored within
the tsk volume object we just opened. Our for loop will then iterate through each volume returned
and for each volume it will grab the identifier attribute from the volume object created in the for loop
and store the result in the volume_identifier variable. 

Our last line of code is checking to see if a volume_identifier was returned, if it was then we will append it to our list of volume_identifiers we initialized prior to our for loop. 

print(u'The following partitions were found:')print(u'Identifier\tOffset\t\t\tSize') for volume_identifier in sorted(volume_identifiers): volume = volume_system.GetVolumeByIdentifier(volume_identifier) if not volume: raise RuntimeError( u'Volume missing for identifier: {0:s}.'.format(volume_identifier)) volume_extent = volume.extents[0] print( u'{0:s}\t\t{1:d} (0x{1:08x})\t{2:d}'.format( volume.identifier, volume_extent.offset, volume_extent.size)) print(u'')
In this last bit of code we are printing out the information we know so far about these
partitions/volumes. We do that by doing a for loop over our volume_indetifiers list.
For each volume stored within it we are calling the GetVolumeByIdentifier function and 
storing the returned object in the volume variable. 

We then print three properties from the volume object returned, the identifier 
(the partition or volume number), the offset to where the volume begins (in decimal
and hex) and lastly how large the volume is.

Woo, that's it! I know that is a lot to go through for an introduction post but it all
build on this and within a few posts you will really begin to understand the power of

You can download this python script from the github here: