Daily Blog #374: Automating DFIR with dfVFS part 4

Automating DFIR with dfVFS part 4 by David Cowen - Hacking Exposed Computer Forensics Blog

Hello Reader,
            In our last entry in this series we took our partition listing script and added support for raw images. Now our simple script should be able to work with forensic images, virtual disks, raw images and live disks.

If you want to show your support for my efforts, there is an easy way to do that. 

Vote for me for Digital Forensic Investigator of the Year here: https://forensic4cast.com/forensic-4cast-awards/


Now that we have that working let's actually get it to do something useful, like extract a file.

First let's look at the code now:

import sys
import logging

from dfvfs.analyzer import analyzer
from dfvfs.lib import definitions
from dfvfs.path import factory as path_spec_factory
from dfvfs.volume import tsk_volume_system
from dfvfs.resolver import resolver
from dfvfs.lib import raw

source_path="stage2.vhd"

path_spec = path_spec_factory.Factory.NewPathSpec(
          definitions.TYPE_INDICATOR_OS, location=source_path)

type_indicators = analyzer.Analyzer.GetStorageMediaImageTypeIndicators(
          path_spec)

if len(type_indicators) > 1:
  raise RuntimeError((
      u'Unsupported source: {0:s} found more than one storage media '
      u'image types.').format(source_path))

if len(type_indicators) == 1:
  path_spec = path_spec_factory.Factory.NewPathSpec(
      type_indicators[0], parent=path_spec)

if not type_indicators:
  # The RAW storage media image type cannot be detected based on
  # a signature so we try to detect it based on common file naming
  # schemas.
  file_system = resolver.Resolver.OpenFileSystem(path_spec)
  raw_path_spec = path_spec_factory.Factory.NewPathSpec(
      definitions.TYPE_INDICATOR_RAW, parent=path_spec)

  glob_results = raw.RawGlobPathSpec(file_system, raw_path_spec)
  if glob_results:
    path_spec = raw_path_spec

volume_path_spec = path_spec_factory.Factory.NewPathSpec(
        definitions.TYPE_INDICATOR_TSK_PARTITION, location=u'/',
        parent=path_spec)

volume_system = tsk_volume_system.TSKVolumeSystem()
volume_system.Open(volume_path_spec)

volume_identifiers = []
for volume in volume_system.volumes:
  volume_identifier = getattr(volume, 'identifier', None)
  if volume_identifier:
    volume_identifiers.append(volume_identifier)
 
print(u'The following partitions were found:')
print(u'Identifier\tOffset\t\t\tSize')

for volume_identifier in sorted(volume_identifiers):
  volume = volume_system.GetVolumeByIdentifier(volume_identifier)
  if not volume:
    raise RuntimeError(
        u'Volume missing for identifier: {0:s}.'.format(volume_identifier))

  volume_extent = volume.extents[0]
  print(
      u'{0:s}\t\t{1:d} (0x{1:08x})\t{2:d}'.format(
          volume.identifier, volume_extent.offset, volume_extent.size))

print(u'')

path_spec = path_spec_factory.Factory.NewPathSpec(
        definitions.TYPE_INDICATOR_TSK_PARTITION, location=u'/p1',
        parent=path_spec)

mft_path_spec = path_spec_factory.Factory.NewPathSpec(
        definitions.TYPE_INDICATOR_TSK, location=u'/$MFT',
        parent=path_spec)

file_entry = resolver.Resolver.OpenFileEntry(mft_path_spec)


stat_object = file_entry.GetStat()

print(u'Inode: {0:d}'.format(stat_object.ino))
print(u'Inode: {0:s}'.format(file_entry.name))
extractFile = open(file_entry.name,'wb')
file_object = file_entry.GetFileObject()

data = file_object.read(4096)
while data:
          extractFile.write(data)
          data = file_object.read(4096)

extractFile.close
file_object.close()

The first thing I changed was what image I'm working from back to stage2.vhd.

source_path="stage2.vhd"

 At this point though you should be able to pass it any type of supported image.

Next after the code we first wrote to list out the partitions within an image we added a new path specification layer to make an object that points to the first partition within the image.

path_spec = path_spec_factory.Factory.NewPathSpec(
        definitions.TYPE_INDICATOR_TSK_PARTITION, location=u'/p1',
        parent=path_spec)
You can see we are using the type of TSK_PARTITION again because we know this is a partition but the location has changed from the prior type we made a parition path spec object. This is because our prior object pointed to the root of the image so we could iterate through the partitions and the new object is referencing just the 1st partition.

Next we make another path specification object that build on the partition type object.

mft_path_spec = path_spec_factory.Factory.NewPathSpec(
        definitions.TYPE_INDICATOR_TSK, location=u'/$MFT',
        parent=path_spec)

Here we are creating a TSK object and telling it that we want it to point to the file $MFT at the root of the file system. Notice we didn't have to tell it the kind of file system, offsets to where it begins or any other data. The resolver and analyzer helper classes within dfVFS will figure all out of that out for us, if it can. In tomorrows post we will put in some more conditional code to detect when it can in fact not do that for us.

So now that we have a path spec object was a reference to a file we want to work with let's get an object for that file.

file_entry = resolver.Resolver.OpenFileEntry(mft_path_spec)

The resolver helper class OpenFileEntry function takes the path spec object we made that points to the $MFT and if it can access it will return an object that references it.

Next we are going to gather some data about the file we are accessing.

stat_object = file_entry.GetStat()

First we used the GetStat function available from the file entry object to return information about the file into a new object called stat object. This is similar to running the stat command on a file.

Next we are going to print what I'm refering to below as the Inode number:
print(u'Inode: {0:d}'.format(stat_object.ino))

MFT's don't have Inodes this is actually the MFT record number but the concept is the same. We are calling the stat_object property ino to access the mft record number. You could also access the size of the file, dates associated and other data but this is a good starting place.

Next we want to print the name of the file we are accessing.
print(u'Inode: {0:s}'.format(file_entry.name))


The file_entry object property contains the name. This is much easier than with pyTsk where we had to walk a meta sub object property structure to get the file name out.

Now we need to open a file handle to where we want to put the MFT data out to

extractFile = open(file_entry.name,'wb')

Notice two things. One we are using the file_entry.name property directly in the open file handle call, this means our extracted file will have the same name as the file in the image. Two we are passing in the options wb which means that the file handle can be written to, and when it is written to should be treated as a binary file. This is important in Windows systems as when you write out binary data any new lines could be interpreted unless you pass in the binary mode flag.

Now we need to interact with not just the properties of the file in the image, but what data its actually storing

file_object = file_entry.GetFileObject()

We do that by calling the GetFileObject function from the file_entry object. This is giving us a file object just like extractFile that normal python functions can read from. The file handle is being stored in the variable file_object.

Now we need to read the data from the file in the image and then write it out to a file on the disk.

data = file_object.read(4096)
while data:
          extractFile.write(data)
          data = file_object.read(4096)

First we need to read from the file handle we opened to the image. We are going to do that for 4k of data and then enter a while loop. The while loop is saying as long as there is data being read from the read call to file_object to keep reading 4k chunks. When we reach the end of the file our data variable will contain a null return and the while loop will stop iterating.

While there is data the write function on the extractFile handle will write the data we read and then we will read the next 4k chunk and iterate through the loop again.

Lastly for good measure we are going to close the handle to both file within the image and the file we are writing to on our local disk.

extractFile.close
file_object.close()

And that's it!

In future posts we are going to access volume shadow copies, take command line options, iterate through multiple partitions and directories and add a GUI. Lot's to do but we will do it one piece at a time.

You can download this posts code here on GitHub: 




Post a Comment