Automating DFIR - How to series on programming libtsk with python Part 9

Automating DFIR - How to series on programming libtsk with python Part 9

Hello Reader,
         Welcome to part 9, I really hope you've read parts 1-8 before getting here. If not you will likely get pretty lost at this point. Here are the links to the prior parts in this series so you can get up to speed before going forward, each posts build on the next.

Part 1 - Accessing an image and printing the partition table
Part 2 - Extracting a file from an image
Part 3  - Extracting a file from a live system
Part 4 - Turning a python script into a windows executable
Part 5 - Auto escalating your python script to administrator
Part 6 - Accessing an E01 image and extracting files
Part 7 - Taking in command line options with argparse to specify an image
Part 8 - Hashing a file stored in a forensic image

Following this post the series continues:

Part 10 - Recursively searching for files and extracting them from an image
Part 11 - Recursively searching for files and extracting them from a live system 
Part 12 - Accessing different file systems
Part 13 - Accessing Volume Shadow Copies  

Now that we have that taken care of it's time to start working on DFIR Wizard Version 8! The next step in our roadmap is to recurse through directories to hash files rather than just hashing specific files. To do this we won't need any additional libraries! However, we will need to write some new code and be introduced to some new coding concepts that may take a bit for you to really understand unless you already have a computer science background.

So let's start with a simplified program. We'll strip out all the code after we open up our NTFS partition and store it in the variable filesystemObject and our code now looks like this:

#!/usr/bin/python
# Sample program or step 8 in becoming a DFIR Wizard!
# No license as this code is simple and free!
import sys
import pytsk3
import datetime
import pyewf
import argparse
import hashlib
     
class ewf_Img_Info(pytsk3.Img_Info):
  def __init__(self, ewf_handle):
    self._ewf_handle = ewf_handle
    super(ewf_Img_Info, self).__init__(
        url="", type=pytsk3.TSK_IMG_TYPE_EXTERNAL)

  def close(self):
    self._ewf_handle.close()

  def read(self, offset, size):
    self._ewf_handle.seek(offset)
    return self._ewf_handle.read(size)

  def get_size(self):
    return self._ewf_handle.get_media_size()
argparser = argparse.ArgumentParser(description='Extract the $MFT from all of the NTFS partitions of an E01')
argparser.add_argument(
        '-i', '--image',
        dest='imagefile',
        action="store",
        type=str,
        default=None,
        required=True,
        help='E01 to extract from'
    )
args = argparser.parse_args()
filenames = pyewf.glob(args.imagefile)
ewf_handle = pyewf.handle()
ewf_handle.open(filenames)
imagehandle = ewf_Img_Info(ewf_handle)

partitionTable = pytsk3.Volume_Info(imagehandle)
for partition in partitionTable:
  print partition.addr, partition.desc, "%ss(%s)" % (partition.start, partition.start * 512), partition.len
  if 'NTFS' in partition.desc:
    filesystemObject = pytsk3.FS_Info(imagehandle, offset=(partition.start*512))

Now previously we were creating file object's by opening a file stored in a known path. If either a) don't know the path to the file or b) want to look at all the files in the file system we need something other than a known path stored in a file object to get us there. What we have instead is a directory object. We create a directory object by called the function open_dir that, filesystemObject inherited from FS_Info, giving it the path to the directory we want to open and then storing the result in a variable. Our code would look like this if we were opening the root directory (aka '/' ) and storing it in a variable named directoryObject.

directoryObject = filesystemObject.open_dir(path="/")

Now we have a directoryObject which makes available to us a list of all the files and directories stored within that directory. To access each entry in this directory listing we need to assign use a loop operator like we did when we went through all of the partitions available in the image. We are going to use the for loop operator, except instead of looping through available partitions we are going to loop through directory entries in a directory. The for loop looks like this:

for entryObject in directoryObject:

This for loop will assign each entry in the root directory to our variable entryObject one a time until it has gone through all of the directory entries in the root directory.

Now we get to do something with the entries returned. To begin with let's print out their names using the info.name.name attribute of entryObject, just like we did earlier in the series.

print entryObject.info.name.name

The completed code looks like this:

#!/usr/bin/python
# Sample program or step 8 in becoming a DFIR Wizard!
# No license as this code is simple and free!
import sys
import pytsk3
import datetime
import pyewf
import argparse
import hashlib
     
class ewf_Img_Info(pytsk3.Img_Info):
  def __init__(self, ewf_handle):
    self._ewf_handle = ewf_handle
    super(ewf_Img_Info, self).__init__(
        url="", type=pytsk3.TSK_IMG_TYPE_EXTERNAL)

  def close(self):
    self._ewf_handle.close()

  def read(self, offset, size):
    self._ewf_handle.seek(offset)
    return self._ewf_handle.read(size)

  def get_size(self):
    return self._ewf_handle.get_media_size()
argparser = argparse.ArgumentParser(description='List files in a directory')
argparser.add_argument(
        '-i', '--image',
        dest='imagefile',
        action="store",
        type=str,
        default=None,
        required=True,
        help='E01 to extract from'
    )
args = argparser.parse_args()
filenames = pyewf.glob(args.imagefile)
ewf_handle = pyewf.handle()
ewf_handle.open(filenames)
imagehandle = ewf_Img_Info(ewf_handle)

partitionTable = pytsk3.Volume_Info(imagehandle)
for partition in partitionTable:
  print partition.addr, partition.desc, "%ss(%s)" % (partition.start, partition.start * 512), partition.len
  if 'NTFS' in partition.desc:
    filesystemObject = pytsk3.FS_Info(imagehandle, offset=(partition.start*512))
    directoryObject = filesystemObject.open_dir(path="/")
    for entryObject in directoryObject:
      print entryObject.info.name.name
 
If you were to run this against the SSFCC-Level5.E01 image you would get the following output:
C:\Users\dave\Desktop>python dfirwizard-v8.py -i SSFCC-Level5.E01
0 Primary Table (#0) 0s(0) 1
1 Unallocated 0s(0) 8064
2 NTFS (0x07) 8064s(4128768) 61759616
$AttrDef
$BadClus
$Bitmap
$Boot
$Extend
$LogFile
$MFT
$MFTMirr
$Secure
$UpCase
$Volume
.
BeardsBeardsBeards.jpg
confession.txt
HappyRob.png
ILoveBeards.jpg
ItsRob.jpg
NiceShirtRob.jpg
OhRob.jpg
OnlyBeardsForMe.jpg
RobGoneWild.jpg
RobInRed.jpg
RobRepresenting.jpg
RobToGo.jpg
WhatchaWantRob.jpg
$OrphanFiles
This code works and prints for us all of the files and directories located within the root directory we specified. However, most forensic images contain file systems that do have sub directories, sometimes thousands of them. To be able to go through all the files and directories stored within a partition we need to change our code to be able to determine if a directory entry is a file or a directory. If the entry is a directory then we need to check the contents of the sub directory as well. We will do this check over and over again until we have checked all the directories within the file system.

To write this kind of code we need to make use of a common computer science concept called recursion. In short instead of just writing the code to try to guess how many sub directories exist in an image we will write a function that will call itself every time we find a sub directory. That function will then call itself every time it finds a sub directory over and over until it reaches the end of that particular directory path and then unwind back.

Before we get into the new recursive function let's expand on part 7 by adding some more command line options. The first command line option we will add will allow the user to specify the directory we will start from when we do our hashing. I am not going to repeat what the code means as we covered it in part 7 so here is the new command line argument:
argparser.add_argument(
'-p', '--path',
dest='path',
action="store",
type=str,
default='/',
required=False,
help='Path to recurse from, defaults to /'
)

What is new here that I need to explain is passing in a default. If the user does not provide a path to search then the default value we specify in the default setting will be stored in the variable path. Otherwise if the user does specify a path then that path will be stored in the path variable instead.

The next command line option we will give the user will be to specify the name of the file we will write our hashes to. It wouldn't be very useful if all of our output went to the command line and vanished once we exited. Most forensic examiners I know eventually take their forensic tool output to programs like Microsoft Word or Excel to get it into something that whomever asked us to do our investigation can read. The code looks like this:
argparser.add_argument(
'-o', '--output',
dest='output',
action="store",
type=str,
default='inventory.csv',
required=False,
help='File to write the hashes to'
)

You'll see here again I made a default value if the user didn't provide an output file name.So if the user provides a value to output it will become the file name, otherwise our program will create a file named inventory.csv to store the results.

Now that we've added additional command line arguments, we need to use them. First we need to assign the directory we are going to start our hashing from to a variable like so:

dirPath = args.path

Next we need to open up our output file name for writing.

outfile = open(args.output,'w')

With this in place we can then change our directory object to use our dirPath variable rather than a hard coded parameter:

directoryObject = filesystemObject.open_dir(path=dirPath)

We will then print out to the command window the directory we are starting with, so that the user knows what is happening:

print "Directory:",dirPath

Last we will call our new function:

directoryRecurse(directoryObject,[])

You can see that we have named our new function directoryRecurse and that its taking two parameters. The first is the directoryObject we made so that it can start at whatever directory the user specified or the root directory is non was specified. The second parameter my look odd but its python's way of specifying an empty list. We will pass in a empty list because when the function calls itself for the directories it finds it will need a list to keep track of the full path that lead to it.

Now let's take a step back to explain the need for this list. A file or a directory has one thing in common as its stored on the disk, it only knows the parent directory it's in but not any above it. What do I mean by that? Let's say we are looking at a directory we commonly deal with like \windows\system32\config. This is where Windows stores non user registries and we work from it often in our work. Now to us as the user it would seem that it is logical that that is the full path to the file, but as the data is stored on the disk it is a different story. Config knows it's parent directory is system32, but only by inode, system32 knows it's parent directory is Windows, but only by inode and Windows knows it's parent directory is the root of the drive. What I'm trying to get across here is that we can't look at a directory we run across within a file system and just pull back the full path to the file from the file system itself (in NTFS we mean the Master File Table or MFT), instead we have to track that ourselves. So our empty list we are passing in at this point will keep track of the directory paths that make up the full path the files and directories we will be hashing.

With that explained let's move on to the real work involved, the directoryRecurse function. It starts with a function definition just like the other functions we have made so far:

def directoryRecurse(directoryObject, parentPath):

you can see that we have called the directoryObject we are passing in directoryObject and the empty list we are assigning to a variable named parentPath. Next we need to do something with what we passed in by iterating through all the entries in the directorObject passed in.

for entryObject in directoryObject:
if entryObject.info.name.name in [".", ".."]:
continue

You'll notice though that we have a new test that is contained in an if statement. We are making sure that the directory entry we are looking at for this iteration of the for loop is not '.' or '..'. These are special directory entries that exist to allow us to always be able to refer to the directory itself (.) and the parent directory (..). If we were to let our code trying to keep calling the parent of itself one of two things could happen, 1. We could enter an infinite loop of going back into the same directory or 2. We reach a finite amount of recursion before we exit with an error . In the case of Python 2.7 the second answer is the correct one. So if our directory entry has . or .. has a file name we will skip it by using the continue operator. Continue tells python to skip anything code let in this iteration of the loop and move on to the next directory entry to process.

The next thing we are going to do is make use of the Python try function.

try:
f_type = entryObject.info.meta.type
except:
print "Cannot retrieve type of",entryObject.info.name.name
continue

Try will let us attempt a line of code that we know may return an error that would normally cause our program to terminate. Instead of terminating though our code will call the except routine and perform whatever tasks we specify. In the case of our code above we are attempting to assign the type of directory entry we are dealing with to the variable f_type. If the variable is assigned the we will move on through our code, if it does not then we will print an error letting the user know that we have a directory entry we cant handle and then skip to the next directory entry to process. To read more about Try/Except go here: https://docs.python.org/2/tutorial/errors.html#handling-exceptions

To see the full list of type's that pytsk will return go here: http://www.sleuthkit.org/sleuthkit/docs/api-docs/tsk__fs_8h.html#a354d4c5b122694ff173cb71b98bf637b but what we care about for this program is if the directory entry is a directory or not. To test this we will use the if conditional operator to see if the contents of f_type are equal to the value we know means a directory. The value that represents that a directory entry is a directory is 0x02 but we can use the constant provided to us by libtsk as TSK_FS_META_TYPE DIR. The if statement then looks like this.

if f_type == pytsk3.TSK_FS_META_TYPE_DIR:
Before we go farther there is one thing we need to do. We need to create a variable to store the current full path we are working with in a printable form for output. To do that we will make using of some string formatting you first saw in part 1.

filepath = '/%s/%s' % ('/'.join(parentPath),entryObject.info.name.name)
Here we are assigning to the variable filepath a string that is made up of two variables. The first is our parentPath list, which in the first iteration of this function is empty, and the second which is the name of the file or directory we are currently processing. We use the join operator to place a / between the name of any directories stored in parentPath and then the strings formatting of %s will convert whatever is returned into a string and then store the whole result in filepath. The end result is the full path to the file as the user expects it.

With that out of the way let's look at what happens if the directory entry we are processing is in fact a directory.

sub_directory = entryObject.as_directory()
parentPath.append(entryObject.info.name.name)
directoryRecurse(sub_directory,parentPath)
parentPath.pop(-1)
print "Directory: %s" % filepath

The first thing we are doing is getting a directory object to work with out of our file object. We do this using the as_directory function which will return a directory object if the file object it is launched from is in fact a directory. If it's not the function will throw an error and exit the program (unless you do this in a Try/Except block). We are storing our newly created directory object in a variable named sub_directory.

The next thing we are doing is appending a value to our empty list parentPath. The append operator will take a value, in this case the name of the directory we are processing, and place it at the end of the list. This makes sure the directories listed in our list will always be in the order we explored them.

The next thing we are doing is calling the same function we started with, but now with our sub_directory object as the first parameter and our populated parentPath variable as the second parameter instead of an empty list. From this point forward every directory we encounter will be spun off into a new sub directory until we've reached the last directory in the chain and then it will unwind back to the beginning.

When our function returns from checking the directory we need to remove it from the full path list and we can achieve that with the pop function. In this case we are telling the pop function to remove the index value -1 which translates to the last element added on to the list.

Last for the user we are printing the full path to the directory we have finished with so they know we are still running.

Next we need to write the function of what to do with the files within these directories we are recursing. This wouldn't be a very useful program if it all it did was find directories! To do this we will check f_type again and make sure that the directory entry we are checking actually contains some data to hash:

elif f_type == pytsk3.TSK_FS_META_TYPE_REG and entryObject.info.meta.size != 0:
filedata = entryObject.read_random(0,entryObject.info.meta.size)
md5hash = hashlib.md5()
md5hash.update(filedata)
sha1hash = hashlib.sha1()
sha1hash.update(filedata)

Here you can see we are calling the constant provided by libtsk of TSK_FS_META_TYPE_REG which means this is a regular file. Next we are making sure that our file contains some data, it is not a 0 byte file. If we were to pass a 0 byte file to read_random, as we do in the next line, it would throw an error and exit the program.

After we have verified that we do in fact have a regular file to parse and that it has some data we will open it up and hash it as we saw in part 8. Now, let's do something with this data other than print it to the command line. To do this we are going to bring in a new python library called 'csv' with an import command at the beginning of our program:

import csv

The CSV python library is very useful! It's care of all of the common tasks of reading and writing a coma separated value file for us, including handling the majority of special exceptions when dealing with strange values. We are going to create a csv object that will write out our data into csv form for us using the following statement:

wr = csv.writer(outfile, quoting=csv.QUOTE_ALL)

So wr now contains a csv writer object with two parameters. The first is telling it to write out to the file object outfile which we set above and the second is telling it to place all the values we are writing out into quotes so that we don't have to worry as much as about what is contained within a value breaking the csv format.

Next we need to write out a header for output file so that the user will understand what columns they are looking at.

outfile.write('"Inode","Full Path","Creation Time","Size","MD5 Hash","SHA1 Hash"\n')

If you modify this program to include additional data in your inventory, make sure to update this header as well!

Next returning to our function we need to write out the hash we just generated, along with other helpful information, to our csv output file

wr.writerow([int(entryObject.info.meta.addr),
'/'.join(parentPath)+entryObject.info.name.name,
datetime.datetime.fromtimestamp(entryObject.info.meta.crtime).
      strftime('%Y-%m-%d%H:%M:%S'),
int(entryObject.info.meta.size),
md5hash.hexdigest(),
sha1hash.hexdigest()])

You can see here that we've taken all the variables we were previously printing out to the command window and we placing it in a list operator, the [ and ] that wraps around all the paramters, to have their results returned as a string. What will come out of this function writerow is a single, fully quoted, csv entry for the file we just hashed. We can modify this line to include any other data we want to know about the file we are processing.

Next we need to handle a special condition, 0 byte files. We already know that read_random will fail if we try to read a 0 byte file, luckily we don't have to read a 0 byte file to know its hash value. The hash of a 0 byte file is well known for both md5 and sha1 so we can just create a condition to test for them and fill them in!

elif f_type == pytsk3.TSK_FS_META_TYPE_REG and entryObject.info.meta.size == 0:
wr.writerow([int(entryObject.info.meta.addr),
'/'.join(parentPath)+entryObject.info.name.name,
datetime.datetime.fromtimestamp(entryObject.info.meta.crtime).strftime('%Y-%m-%d %H:%M:%S'),
int(entryObject.info.meta.size),
"d41d8cd98f00b204e9800998ecf8427e",
"da39a3ee5e6b4b0d3255bfef95601890afd80709"])

So above we are making sure we are dealing with a regular file that has 0 bytes of data. Once we know that is the case we are calling the same writerow function but now instead of generating hashes we are filling in the already known values for 0 byte files. The MD5 sum of a 0 byte file is d41d8cd98f00b204e9800998ecf8427e and the SHA1 sum of a 0 byte file is da39a3ee5e6b4b0d3255bfef95601890afd80709.

The only thing left to do now is to put in the except conditional for this entire section:
except IOError as e:
print e
continue

Telling our program that if we got some kind of error in doing any of this to print the error to command window and move on to the next directory entry.

That's it! We now have a version of DFIR Wizard that will recurse through every directory we specify in an NTFS partition, hash all the files contained with in it and then write that out long with other useful data to a csv file we specify.

You can grab the full code here from our series Github: https://github.com/dlcowen/dfirwizard/blob/master/dfirwizard-v8.py

Pasting in the all the code will break the blog formatting so I can't do that this time.

In part 10 of this series we will extend this further to search for files within an image and extract them!

Special thanks to David Nides and Kyle Maxwell who helped me through some late night code testing!

Continue Reading: Automating DFIR - How to series on programming libtsk with python Part 8

Post a Comment