Automating DFIR - How to series on programming libtsk with python Part 11

Automating DFIR - How to series on programming libtsk with python Part 11


Hello Reader,
      I had a bit of a break thanks to a long overdue vacation but I'm back and the code I'll be talking about today has been up on the github repository for almost 3 weeks, so if you ever want to get ahead go there as I write the code before I try to explain it! Github repository is here: https://github.com/dlcowen/dfirwizard

Now before we continue a reminder, don't start on this post! We've come a long way to get to this point and you should start at part 1 if you haven't already!

Part 1 - Accessing an image and printing the partition table
Part 2 - Extracting a file from an image
Part 3  - Extracting a file from a live system
Part 4 - Turning a python script into a windows executable
Part 5 - Auto escalating your python script to administrator
Part 6 - Accessing an E01 image and extracting files
Part 7 - Taking in command line options with argparse to specify an image
Part 8 - Hashing a file stored in a forensic image
Part 9 - Recursively hashing all the files in an image
Part 10 - Recursively searching for files and extracting them from an image

Following this post the series continues:

Part 12 - Accessing different file systems
Part 13 - Accessing Volume Shadow Copies  

In this post we are going to augment the script on part 10 which went through an image and search/extracted files from all the NTFS partition in an image, and now we are going to do the same against all the NTFS partitions on a live system. You can obviously tweak this for any other file system but we will get to that in later posts in this series.

The first thing we need is a way to figure out what partitions exist on a live system in cross platform way so our future code can be tweaked to run anywhere. For this I choose the python library psutil which can provide a wealth of information about a system its running on, included information about available disks and partitions, you can read all about it here: https://pypi.python.org/pypi/psutil

To bring it into our program we need to call the import function again:

import psutil

and then because we are going to work against a live running system again we need our old buddy admin

import admin

which if you remember from part 5 will auto escalate our script to administrator just in case we forgot to run it as such.

We are going to strip out the functions we need to find all the parts of a forensic image and replace it with out code to test for administrative access:

if not admin.isUserAdmin():
  admin.runAsAdmin()
  sys.exit()
Next we replace the functions we called to get a partition table from a forensic image with a call to psutil to return a listing of paritions and iterate through them. The code looks like the following which I will explain:

partitionList = psutil.disk_partitions()
for partition in partitionList:
  imagehandle = pytsk3.Img_Info('\\\\.\\'+partition.device.strip("\\"))
  if 'NTFS' in partition.fstype:

So here instead of calling pytsk for a partition table we are calling psutil.disk_partitions which will return a list of partitions that are available to the local system. I much prefer this method than trying to iterate through all volume letters as we will get back just those partitions available as well as what file system they are seen as running as. Our list of active partitions will be stored in the varaible partitionList. Next we will iterate through the partitions using the for operator storing each partition returned into the partition variable. Next we are creating a pytsk3 Img_Info object for each partition returned but only continuing if psutil recognized the partition is NTFS.

The next thing we are changing is our try catch blog in our recursive directory function. Why? I found in my testing that live systems react much differently than forensic images in setting certain values in libtsk. So rather than using entryObject.info.meta.type to determine if I'm dealing with a regular file I am using entryObject.info.name.type which seem to always be set regardless if its a live system or a forensic image. I'm testing to see if I can capture the type of the file and it's size here as there are a lot of interesting special files that only appear at run time that will throw an error if you try to get their size. 

try:
        f_type = entryObject.info.name.type
        size = entryObject.info.meta.size
      except Exception as error:
          print "Cannot retrieve type or size of",entryObject.info.name.name
          print error.message
          continue

So in the above code I'm getting the type of file (lnk, regular, etc..) and it's size and if I can't I'm handling the error and printing out the error before continuing on. You will see errors, live systems are an interesting place to do forensics.

I am now going to make a change I alluded to earlier on in the series. We are going to buffer out reads and writes so we don't crash our of our program because we are trying to read a massive file into memory. This wasn't a problem in our first examples as we were working from small test images I made before, but now they we are dealing with real systems and real data we need to handle our data with care.

Our code looks as follows:

            BUFF_SIZE = 1024 * 1024
            offset=0
            md5hash = hashlib.md5()
            sha1hash = hashlib.sha1()
            if args.extract == True:
                  if not os.path.exists(outputPath):
                    os.makedirs(outputPath)
                  extractFile = open(outputPath+entryObject.info.name.name,'w')
            while offset < entryObject.info.meta.size:
                available_to_read = min(BUFF_SIZE, entryObject.info.meta.size - offset)
                filedata = entryObject.read_random(offset,available_to_read)
                md5hash.update(filedata)
                sha1hash.update(filedata)
                offset += len(filedata)
                if args.extract == True:
                  extractFile.write(filedata)

            if args.extract == True:
                extractFile.close

First we need to determine how much data we want to read or write at one time from a file. I've copied several other examples I've found and I'm setting that amount to 1 meg of data at a time by setting the variable BUFF_SIZE equal to 1024*1024 or one megabyte. Next we need to keep track of where we are in the file we are dealing with, we do that by creating a new variable called offset and setting the offset to 0 to start with.

You'll notice that we are creating our hash objects, directories and file handles before we read in any data. That is because we want to do all of these things one time prior to iterating through the contents of a file. If a file is a gigabyte in size then our function will be called 1,024 times and we just want one hash and one output file to be created.

Next we starting a while loop which will continue to execute until our offset is greater or equal to the size of our file, meaning we've read all the data within it. Now files are not guaranteed to be allocated in 1 meg chunks, so to deal with that we are going to take advantage of a python function called min. Min returns the smaller of to values presented which in our code is the size of the buffer compared to the remaining data left to read (the size of the file - our current offset). Whichever value is smaller will be stored in the variable available_to_read.

After we know how much data we want to read in this execution of our while loop we are going to read it as before from our entryObject passing in the offset to start from and how much data to read, storing the data read into the variable filedata. We are then calling the update function provided by our hashing objects. One of the nice things our the hashlibs provided by python is that if you provide additional data to an already instantiated object it will just continue to build the hash rather than having to read it all in at once.

Next we are incrementing our offset by adding to itself the length of data we just read so we will skip past it on the next while loop execution. Finally we write the data out to our output file if we elected to extract the files we are searching for.

I've added one last bit of code to help me catch any other weirdness that may seep through.

        else:
          print "This went wrong",entryObject.info.name.name,f_type

An else to look for any condition that does not match one of existing if statements.

That's it! You now have a super DFIR Wizard program that will go through all the active NTFS partitions on a running system and pull out and hash whatever files you want!

You can find the complete code here: https://github.com/dlcowen/dfirwizard/blob/master/dfirwizard-v10.py

In the next post we will talk about parsing partitions types other than NTFS and then go into volume shadow copy access!

Forensic Lunch 3/20/15 - James Carder and Eric Zimmerman

Forensic Lunch 3/20/15 - James Carder and Eric Zimmerman


Hello Reader!,
           We had another great Forensic Lunch! This broadcast we had:

James Carder of the Mayo Clinic, @carderjames, talking all about automating your response process to separate the random attacks from sophisticated attacks. You can hear James talk about this and much more at the SANS DFIR Summit where he'll be a panelist! If you want to work with James Mayo Clinic is hiring.

Mayo Clinic Infosec and IR Jobs: http://www.mayo-clinic-jobs.com/go/information-technology-engineering-and-architecture-jobs/255296/?facility=MN
Contact James Carder: carder.james@mayo.edu

Special Agent Eric Zimmerman of the FBI, @EricRZimmerman , talking about his upcoming in depth Shellbags talk at the SANS DFIR Summit as well as his new tool called Registry Explorer. RE and Eric's research into windows registries will be continued in the next broadcast. Whether you are interested in registries from a research, academic or investigative perspective this is a must see, and FREE, tool!

Eric's Blog: http://binaryforay.blogspot.com/
Eric's Github:https://github.com/EricZimmerman
Registry Explorer: http://binaryforay.blogspot.com/p/software.html


You can watch the broadcast here on Youtube: https://www.youtube.com/watch?v=lj7cMHySGSE

Or in the embedded player below:



Automating DFIR - How to series on programming libtsk with python Part 10

Automating DFIR - How to series on programming libtsk with python Part 10


Hello Reader,
If you just found this series I have good news! There is way more of it to read and you should start at Part 1. See the links to all the parts below:

Part 1 - Accessing an image and printing the partition table
Part 2 - Extracting a file from an image
Part 3  - Extracting a file from a live system
Part 4 - Turning a python script into a windows executable
Part 5 - Auto escalating your python script to administrator
Part 6 - Accessing an E01 image and extracting files
Part 7 - Taking in command line options with argparse to specify an image
Part 8 - Hashing a file stored in a forensic image
Part 9 - Recursively hashing all the files in an image

Following this post the series continues:

Part 11 - Recursively searching for files and extracting them from a live system 
Part 12 - Accessing different file systems  
Part 13 - Accessing Volume Shadow Copies 

For those of you who are up to date let's get going! In this part of the series we are going to take our recursive hashing script and make it even more useful. We are going to allow the user to search for the kind of files they want to hash with a regular expression enabled search and give them the option to extract those files as they are found back to their original path under your output directory. Ready? Let's go!

First we will need to import two more libraries, so many libraries!. The good news is these are still standard python system libraries, so there is nothing new to install. The first library we will bring is called 'os' which gives us os related functions and will map the proper function based on the os you are running on. The second library called 're' will provide us with regular expression support when evaluating our search criteria. We bring in those libraries as before with an import command:

import os
import re

Now we need to add two more command line arguments to let our user take advantage of the new code we are about to write:

argparser.add_argument(
'-s', '--search',
dest='search',
action="store",
type=str,
default='.*',
required=False,
help='Specify search parameter e.g. *.lnk'
)
argparser.add_argument(
'-e', '--extract',
dest='extract',
action="store_true",
default=False,
required=False,
help='Pass this option to extract files found'
    )

Our first new argument is letting the user provide a search parameter. We are storing the search parameter in the variable args.search and if the user does not provide one we default to .* which will match anything.

The second argument is using a different variable then the rest of our options. We are setting the variable args.extract as a True or False value with the store_true option under action. If the user provides this argument then the variable will be true and the files matched will be extracted, if the user does not then the value will be false and the program will only hash the files it finds.

It's always good to show the user that the argument we received from them is doing something so let's add two lines to see if we have a search term and print it:

if not args.search == '.*':
  print "Search Term Provided",args.search

 Remember that .* is our default search term, so if the value stored in args.search is anything other than .* our program will print out the search term provided, otherwise it will just move on.

Our next changes all happen within our directoryRecurse function. First we need to capture the full path where the file we are looking at exists. We will do this by combining the partition number and the full path that lead to this file in order to make sure its unique between partitions.

outputPath ='./%s/%s/' % (str(partition.addr),'/'.join(parentPath))



Next we will go into the else if statement we wrote in the prior post to handle regular files with non zero lengths. We will add a new line of code to the beginning of the code block that gets executed to do our regular expression search as follows:


elif f_type == pytsk3.TSK_FS_META_TYPE_REG and entryObject.info.meta.size != 0: searchResult = re.match(args.search,entryObject.info.name.name)

You can see we are use the re library here and calling the match function it provides. We are providing two arguments to the match function. The first is the search term the user provided us and the second is the file name of the regular, non zero sized file we are inspecting. If the regular expression provided by the user is a match then a match object will be returned and stored in the searchResult variable, if there is not a match then the variable will contain no data. We write a conditional to test this result next:


if not searchResult:
continue


This allows us to skip any file that did not match the search result provided. If the user did not specify a search term our default value of .* will kick in and everything will be a match.

Our last modification revolves around extracting out the files if our user selected the option to. The code looks like this

if args.extract == True:
if not os.path.exists(outputPath):
os.makedirs(outputPath)
extractFile = open(outputPath+entryObject.info.name.name,'w')
extractFile.write(filedata)
extractFile.close

The first thing we are doing is checking to see if the user has set the extract variable and caused it to be set to True. If they do then we will extract the file, if not we skip it. If it is true the first thing we do is make use of the os library's path.exists function. This will allow us to look at the output directory we want to create (and set in the outputPath variable above) already exists. If it does we can move on, if it does not than we call another os library provided function named makedirs. makedirs is nice because it will recursively create the path you specify so you don't have to loop through all the directories in between if they don't exist.

Now that our output path exists its time to extract it. We are modifying our old extractFile variable and now we are appending on our outputPath to the filename we want to create. This will place the file we are extracting in to the directory we have created. Next we write the data out to it as before and then close the handle since we will be reusing it.

If I was to run this program against the Level5 image we've been working with and specify both the extraction flag and provide a search term of .*jpg it would look like this

C:\Users\dave\Desktop>python dfirwizard-v9.py -i SSFCC-Level5.E01 -e -s .*jpg
Search Term Provided .*jpg
0 Primary Table (#0) 0s(0) 1
1 Unallocated 0s(0) 8064
2 NTFS (0x07) 8064s(4128768) 61759616
Directory: /
Directory: /$Extend/$RmMetadata/$Txf
Directory: /$Extend/$RmMetadata/$TxfLog
Directory: /$Extend/$RmMetadata
Directory: //$Extend
match  BeardsBeardsBeards.jpg
match  ILoveBeards.jpg
match  ItsRob.jpg
match  NiceShirtRob.jpg
match  OhRob.jpg
match  OnlyBeardsForMe.jpg
match  RobGoneWild.jpg
match  RobInRed.jpg
match  RobRepresenting.jpg
match  RobToGo.jpg
match  WhatchaWantRob.jpg
Directory: //$OrphanFiles
On my desktop there would be a folder named 2 and underneath that the full path to the file that matched the search term would exist.

That's it! Part 9 had a lot going on but now that we've built the base for our recursion it gets easier from here. The code for this part is located on the series Github here: https://github.com/dlcowen/dfirwizard/blob/master/dfirwizard-v9.py

Next in part 11 we will do the same thing from live systems but allow our code to enumerate all the physical disks present rather than hardcoding an option.


Automating DFIR - How to series on programming libtsk with python Part 9

Automating DFIR - How to series on programming libtsk with python Part 9

Hello Reader,
         Welcome to part 9, I really hope you've read parts 1-8 before getting here. If not you will likely get pretty lost at this point. Here are the links to the prior parts in this series so you can get up to speed before going forward, each posts build on the next.

Part 1 - Accessing an image and printing the partition table
Part 2 - Extracting a file from an image
Part 3  - Extracting a file from a live system
Part 4 - Turning a python script into a windows executable
Part 5 - Auto escalating your python script to administrator
Part 6 - Accessing an E01 image and extracting files
Part 7 - Taking in command line options with argparse to specify an image
Part 8 - Hashing a file stored in a forensic image

Following this post the series continues:

Part 10 - Recursively searching for files and extracting them from an image
Part 11 - Recursively searching for files and extracting them from a live system 
Part 12 - Accessing different file systems
Part 13 - Accessing Volume Shadow Copies  

Now that we have that taken care of it's time to start working on DFIR Wizard Version 8! The next step in our roadmap is to recurse through directories to hash files rather than just hashing specific files. To do this we won't need any additional libraries! However, we will need to write some new code and be introduced to some new coding concepts that may take a bit for you to really understand unless you already have a computer science background.

So let's start with a simplified program. We'll strip out all the code after we open up our NTFS partition and store it in the variable filesystemObject and our code now looks like this:

#!/usr/bin/python
# Sample program or step 8 in becoming a DFIR Wizard!
# No license as this code is simple and free!
import sys
import pytsk3
import datetime
import pyewf
import argparse
import hashlib
     
class ewf_Img_Info(pytsk3.Img_Info):
  def __init__(self, ewf_handle):
    self._ewf_handle = ewf_handle
    super(ewf_Img_Info, self).__init__(
        url="", type=pytsk3.TSK_IMG_TYPE_EXTERNAL)

  def close(self):
    self._ewf_handle.close()

  def read(self, offset, size):
    self._ewf_handle.seek(offset)
    return self._ewf_handle.read(size)

  def get_size(self):
    return self._ewf_handle.get_media_size()
argparser = argparse.ArgumentParser(description='Extract the $MFT from all of the NTFS partitions of an E01')
argparser.add_argument(
        '-i', '--image',
        dest='imagefile',
        action="store",
        type=str,
        default=None,
        required=True,
        help='E01 to extract from'
    )
args = argparser.parse_args()
filenames = pyewf.glob(args.imagefile)
ewf_handle = pyewf.handle()
ewf_handle.open(filenames)
imagehandle = ewf_Img_Info(ewf_handle)

partitionTable = pytsk3.Volume_Info(imagehandle)
for partition in partitionTable:
  print partition.addr, partition.desc, "%ss(%s)" % (partition.start, partition.start * 512), partition.len
  if 'NTFS' in partition.desc:
    filesystemObject = pytsk3.FS_Info(imagehandle, offset=(partition.start*512))

Now previously we were creating file object's by opening a file stored in a known path. If either a) don't know the path to the file or b) want to look at all the files in the file system we need something other than a known path stored in a file object to get us there. What we have instead is a directory object. We create a directory object by called the function open_dir that, filesystemObject inherited from FS_Info, giving it the path to the directory we want to open and then storing the result in a variable. Our code would look like this if we were opening the root directory (aka '/' ) and storing it in a variable named directoryObject.

directoryObject = filesystemObject.open_dir(path="/")

Now we have a directoryObject which makes available to us a list of all the files and directories stored within that directory. To access each entry in this directory listing we need to assign use a loop operator like we did when we went through all of the partitions available in the image. We are going to use the for loop operator, except instead of looping through available partitions we are going to loop through directory entries in a directory. The for loop looks like this:

for entryObject in directoryObject:

This for loop will assign each entry in the root directory to our variable entryObject one a time until it has gone through all of the directory entries in the root directory.

Now we get to do something with the entries returned. To begin with let's print out their names using the info.name.name attribute of entryObject, just like we did earlier in the series.

print entryObject.info.name.name

The completed code looks like this:

#!/usr/bin/python
# Sample program or step 8 in becoming a DFIR Wizard!
# No license as this code is simple and free!
import sys
import pytsk3
import datetime
import pyewf
import argparse
import hashlib
     
class ewf_Img_Info(pytsk3.Img_Info):
  def __init__(self, ewf_handle):
    self._ewf_handle = ewf_handle
    super(ewf_Img_Info, self).__init__(
        url="", type=pytsk3.TSK_IMG_TYPE_EXTERNAL)

  def close(self):
    self._ewf_handle.close()

  def read(self, offset, size):
    self._ewf_handle.seek(offset)
    return self._ewf_handle.read(size)

  def get_size(self):
    return self._ewf_handle.get_media_size()
argparser = argparse.ArgumentParser(description='List files in a directory')
argparser.add_argument(
        '-i', '--image',
        dest='imagefile',
        action="store",
        type=str,
        default=None,
        required=True,
        help='E01 to extract from'
    )
args = argparser.parse_args()
filenames = pyewf.glob(args.imagefile)
ewf_handle = pyewf.handle()
ewf_handle.open(filenames)
imagehandle = ewf_Img_Info(ewf_handle)

partitionTable = pytsk3.Volume_Info(imagehandle)
for partition in partitionTable:
  print partition.addr, partition.desc, "%ss(%s)" % (partition.start, partition.start * 512), partition.len
  if 'NTFS' in partition.desc:
    filesystemObject = pytsk3.FS_Info(imagehandle, offset=(partition.start*512))
    directoryObject = filesystemObject.open_dir(path="/")
    for entryObject in directoryObject:
      print entryObject.info.name.name
 
If you were to run this against the SSFCC-Level5.E01 image you would get the following output:
C:\Users\dave\Desktop>python dfirwizard-v8.py -i SSFCC-Level5.E01
0 Primary Table (#0) 0s(0) 1
1 Unallocated 0s(0) 8064
2 NTFS (0x07) 8064s(4128768) 61759616
$AttrDef
$BadClus
$Bitmap
$Boot
$Extend
$LogFile
$MFT
$MFTMirr
$Secure
$UpCase
$Volume
.
BeardsBeardsBeards.jpg
confession.txt
HappyRob.png
ILoveBeards.jpg
ItsRob.jpg
NiceShirtRob.jpg
OhRob.jpg
OnlyBeardsForMe.jpg
RobGoneWild.jpg
RobInRed.jpg
RobRepresenting.jpg
RobToGo.jpg
WhatchaWantRob.jpg
$OrphanFiles
This code works and prints for us all of the files and directories located within the root directory we specified. However, most forensic images contain file systems that do have sub directories, sometimes thousands of them. To be able to go through all the files and directories stored within a partition we need to change our code to be able to determine if a directory entry is a file or a directory. If the entry is a directory then we need to check the contents of the sub directory as well. We will do this check over and over again until we have checked all the directories within the file system.

To write this kind of code we need to make use of a common computer science concept called recursion. In short instead of just writing the code to try to guess how many sub directories exist in an image we will write a function that will call itself every time we find a sub directory. That function will then call itself every time it finds a sub directory over and over until it reaches the end of that particular directory path and then unwind back.

Before we get into the new recursive function let's expand on part 7 by adding some more command line options. The first command line option we will add will allow the user to specify the directory we will start from when we do our hashing. I am not going to repeat what the code means as we covered it in part 7 so here is the new command line argument:
argparser.add_argument(
'-p', '--path',
dest='path',
action="store",
type=str,
default='/',
required=False,
help='Path to recurse from, defaults to /'
)

What is new here that I need to explain is passing in a default. If the user does not provide a path to search then the default value we specify in the default setting will be stored in the variable path. Otherwise if the user does specify a path then that path will be stored in the path variable instead.

The next command line option we will give the user will be to specify the name of the file we will write our hashes to. It wouldn't be very useful if all of our output went to the command line and vanished once we exited. Most forensic examiners I know eventually take their forensic tool output to programs like Microsoft Word or Excel to get it into something that whomever asked us to do our investigation can read. The code looks like this:
argparser.add_argument(
'-o', '--output',
dest='output',
action="store",
type=str,
default='inventory.csv',
required=False,
help='File to write the hashes to'
)

You'll see here again I made a default value if the user didn't provide an output file name.So if the user provides a value to output it will become the file name, otherwise our program will create a file named inventory.csv to store the results.

Now that we've added additional command line arguments, we need to use them. First we need to assign the directory we are going to start our hashing from to a variable like so:

dirPath = args.path

Next we need to open up our output file name for writing.

outfile = open(args.output,'w')

With this in place we can then change our directory object to use our dirPath variable rather than a hard coded parameter:

directoryObject = filesystemObject.open_dir(path=dirPath)

We will then print out to the command window the directory we are starting with, so that the user knows what is happening:

print "Directory:",dirPath

Last we will call our new function:

directoryRecurse(directoryObject,[])

You can see that we have named our new function directoryRecurse and that its taking two parameters. The first is the directoryObject we made so that it can start at whatever directory the user specified or the root directory is non was specified. The second parameter my look odd but its python's way of specifying an empty list. We will pass in a empty list because when the function calls itself for the directories it finds it will need a list to keep track of the full path that lead to it.

Now let's take a step back to explain the need for this list. A file or a directory has one thing in common as its stored on the disk, it only knows the parent directory it's in but not any above it. What do I mean by that? Let's say we are looking at a directory we commonly deal with like \windows\system32\config. This is where Windows stores non user registries and we work from it often in our work. Now to us as the user it would seem that it is logical that that is the full path to the file, but as the data is stored on the disk it is a different story. Config knows it's parent directory is system32, but only by inode, system32 knows it's parent directory is Windows, but only by inode and Windows knows it's parent directory is the root of the drive. What I'm trying to get across here is that we can't look at a directory we run across within a file system and just pull back the full path to the file from the file system itself (in NTFS we mean the Master File Table or MFT), instead we have to track that ourselves. So our empty list we are passing in at this point will keep track of the directory paths that make up the full path the files and directories we will be hashing.

With that explained let's move on to the real work involved, the directoryRecurse function. It starts with a function definition just like the other functions we have made so far:

def directoryRecurse(directoryObject, parentPath):

you can see that we have called the directoryObject we are passing in directoryObject and the empty list we are assigning to a variable named parentPath. Next we need to do something with what we passed in by iterating through all the entries in the directorObject passed in.

for entryObject in directoryObject:
if entryObject.info.name.name in [".", ".."]:
continue

You'll notice though that we have a new test that is contained in an if statement. We are making sure that the directory entry we are looking at for this iteration of the for loop is not '.' or '..'. These are special directory entries that exist to allow us to always be able to refer to the directory itself (.) and the parent directory (..). If we were to let our code trying to keep calling the parent of itself one of two things could happen, 1. We could enter an infinite loop of going back into the same directory or 2. We reach a finite amount of recursion before we exit with an error . In the case of Python 2.7 the second answer is the correct one. So if our directory entry has . or .. has a file name we will skip it by using the continue operator. Continue tells python to skip anything code let in this iteration of the loop and move on to the next directory entry to process.

The next thing we are going to do is make use of the Python try function.

try:
f_type = entryObject.info.meta.type
except:
print "Cannot retrieve type of",entryObject.info.name.name
continue

Try will let us attempt a line of code that we know may return an error that would normally cause our program to terminate. Instead of terminating though our code will call the except routine and perform whatever tasks we specify. In the case of our code above we are attempting to assign the type of directory entry we are dealing with to the variable f_type. If the variable is assigned the we will move on through our code, if it does not then we will print an error letting the user know that we have a directory entry we cant handle and then skip to the next directory entry to process. To read more about Try/Except go here: https://docs.python.org/2/tutorial/errors.html#handling-exceptions

To see the full list of type's that pytsk will return go here: http://www.sleuthkit.org/sleuthkit/docs/api-docs/tsk__fs_8h.html#a354d4c5b122694ff173cb71b98bf637b but what we care about for this program is if the directory entry is a directory or not. To test this we will use the if conditional operator to see if the contents of f_type are equal to the value we know means a directory. The value that represents that a directory entry is a directory is 0x02 but we can use the constant provided to us by libtsk as TSK_FS_META_TYPE DIR. The if statement then looks like this.

if f_type == pytsk3.TSK_FS_META_TYPE_DIR:
Before we go farther there is one thing we need to do. We need to create a variable to store the current full path we are working with in a printable form for output. To do that we will make using of some string formatting you first saw in part 1.

filepath = '/%s/%s' % ('/'.join(parentPath),entryObject.info.name.name)
Here we are assigning to the variable filepath a string that is made up of two variables. The first is our parentPath list, which in the first iteration of this function is empty, and the second which is the name of the file or directory we are currently processing. We use the join operator to place a / between the name of any directories stored in parentPath and then the strings formatting of %s will convert whatever is returned into a string and then store the whole result in filepath. The end result is the full path to the file as the user expects it.

With that out of the way let's look at what happens if the directory entry we are processing is in fact a directory.

sub_directory = entryObject.as_directory()
parentPath.append(entryObject.info.name.name)
directoryRecurse(sub_directory,parentPath)
parentPath.pop(-1)
print "Directory: %s" % filepath

The first thing we are doing is getting a directory object to work with out of our file object. We do this using the as_directory function which will return a directory object if the file object it is launched from is in fact a directory. If it's not the function will throw an error and exit the program (unless you do this in a Try/Except block). We are storing our newly created directory object in a variable named sub_directory.

The next thing we are doing is appending a value to our empty list parentPath. The append operator will take a value, in this case the name of the directory we are processing, and place it at the end of the list. This makes sure the directories listed in our list will always be in the order we explored them.

The next thing we are doing is calling the same function we started with, but now with our sub_directory object as the first parameter and our populated parentPath variable as the second parameter instead of an empty list. From this point forward every directory we encounter will be spun off into a new sub directory until we've reached the last directory in the chain and then it will unwind back to the beginning.

When our function returns from checking the directory we need to remove it from the full path list and we can achieve that with the pop function. In this case we are telling the pop function to remove the index value -1 which translates to the last element added on to the list.

Last for the user we are printing the full path to the directory we have finished with so they know we are still running.

Next we need to write the function of what to do with the files within these directories we are recursing. This wouldn't be a very useful program if it all it did was find directories! To do this we will check f_type again and make sure that the directory entry we are checking actually contains some data to hash:

elif f_type == pytsk3.TSK_FS_META_TYPE_REG and entryObject.info.meta.size != 0:
filedata = entryObject.read_random(0,entryObject.info.meta.size)
md5hash = hashlib.md5()
md5hash.update(filedata)
sha1hash = hashlib.sha1()
sha1hash.update(filedata)

Here you can see we are calling the constant provided by libtsk of TSK_FS_META_TYPE_REG which means this is a regular file. Next we are making sure that our file contains some data, it is not a 0 byte file. If we were to pass a 0 byte file to read_random, as we do in the next line, it would throw an error and exit the program.

After we have verified that we do in fact have a regular file to parse and that it has some data we will open it up and hash it as we saw in part 8. Now, let's do something with this data other than print it to the command line. To do this we are going to bring in a new python library called 'csv' with an import command at the beginning of our program:

import csv

The CSV python library is very useful! It's care of all of the common tasks of reading and writing a coma separated value file for us, including handling the majority of special exceptions when dealing with strange values. We are going to create a csv object that will write out our data into csv form for us using the following statement:

wr = csv.writer(outfile, quoting=csv.QUOTE_ALL)

So wr now contains a csv writer object with two parameters. The first is telling it to write out to the file object outfile which we set above and the second is telling it to place all the values we are writing out into quotes so that we don't have to worry as much as about what is contained within a value breaking the csv format.

Next we need to write out a header for output file so that the user will understand what columns they are looking at.

outfile.write('"Inode","Full Path","Creation Time","Size","MD5 Hash","SHA1 Hash"\n')

If you modify this program to include additional data in your inventory, make sure to update this header as well!

Next returning to our function we need to write out the hash we just generated, along with other helpful information, to our csv output file

wr.writerow([int(entryObject.info.meta.addr),
'/'.join(parentPath)+entryObject.info.name.name,
datetime.datetime.fromtimestamp(entryObject.info.meta.crtime).
      strftime('%Y-%m-%d%H:%M:%S'),
int(entryObject.info.meta.size),
md5hash.hexdigest(),
sha1hash.hexdigest()])

You can see here that we've taken all the variables we were previously printing out to the command window and we placing it in a list operator, the [ and ] that wraps around all the paramters, to have their results returned as a string. What will come out of this function writerow is a single, fully quoted, csv entry for the file we just hashed. We can modify this line to include any other data we want to know about the file we are processing.

Next we need to handle a special condition, 0 byte files. We already know that read_random will fail if we try to read a 0 byte file, luckily we don't have to read a 0 byte file to know its hash value. The hash of a 0 byte file is well known for both md5 and sha1 so we can just create a condition to test for them and fill them in!

elif f_type == pytsk3.TSK_FS_META_TYPE_REG and entryObject.info.meta.size == 0:
wr.writerow([int(entryObject.info.meta.addr),
'/'.join(parentPath)+entryObject.info.name.name,
datetime.datetime.fromtimestamp(entryObject.info.meta.crtime).strftime('%Y-%m-%d %H:%M:%S'),
int(entryObject.info.meta.size),
"d41d8cd98f00b204e9800998ecf8427e",
"da39a3ee5e6b4b0d3255bfef95601890afd80709"])

So above we are making sure we are dealing with a regular file that has 0 bytes of data. Once we know that is the case we are calling the same writerow function but now instead of generating hashes we are filling in the already known values for 0 byte files. The MD5 sum of a 0 byte file is d41d8cd98f00b204e9800998ecf8427e and the SHA1 sum of a 0 byte file is da39a3ee5e6b4b0d3255bfef95601890afd80709.

The only thing left to do now is to put in the except conditional for this entire section:
except IOError as e:
print e
continue

Telling our program that if we got some kind of error in doing any of this to print the error to command window and move on to the next directory entry.

That's it! We now have a version of DFIR Wizard that will recurse through every directory we specify in an NTFS partition, hash all the files contained with in it and then write that out long with other useful data to a csv file we specify.

You can grab the full code here from our series Github: https://github.com/dlcowen/dfirwizard/blob/master/dfirwizard-v8.py

Pasting in the all the code will break the blog formatting so I can't do that this time.

In part 10 of this series we will extend this further to search for files within an image and extract them!

Special thanks to David Nides and Kyle Maxwell who helped me through some late night code testing!

Continue Reading: Automating DFIR - How to series on programming libtsk with python Part 8