Tuesday, March 3, 2015

Automating DFIR - How to series on programming libtsk with python Part 10

Hello Reader,
If you just found this series I have good news! There is way more of it to read and you should start at Part 1. See the links to all the parts below:

Part 1 - Accessing an image and printing the partition table
Part 2 - Extracting a file from an image
Part 3  - Extracting a file from a live system
Part 4 - Turning a python script into a windows executable
Part 5 - Auto escalating your python script to administrator
Part 6 - Accessing an E01 image and extracting files
Part 7 - Taking in command line options with argparse to specify an image
Part 8 - Hashing a file stored in a forensic image
Part 9 - Recursively hashing all the files in an image

Following this post the series continues:

Part 11 - Recursively searching for files and extracting them from a live system 
Part 12 - Accessing different file systems  
Part 13 - Accessing Volume Shadow Copies 

For those of you who are up to date let's get going! In this part of the series we are going to take our recursive hashing script and make it even more useful. We are going to allow the user to search for the kind of files they want to hash with a regular expression enabled search and give them the option to extract those files as they are found back to their original path under your output directory. Ready? Let's go!

First we will need to import two more libraries, so many libraries!. The good news is these are still standard python system libraries, so there is nothing new to install. The first library we will bring is called 'os' which gives us os related functions and will map the proper function based on the os you are running on. The second library called 're' will provide us with regular expression support when evaluating our search criteria. We bring in those libraries as before with an import command:

import os
import re

Now we need to add two more command line arguments to let our user take advantage of the new code we are about to write:

'-s', '--search',
help='Specify search parameter e.g. *.lnk'
'-e', '--extract',
help='Pass this option to extract files found'

Our first new argument is letting the user provide a search parameter. We are storing the search parameter in the variable args.search and if the user does not provide one we default to .* which will match anything.

The second argument is using a different variable then the rest of our options. We are setting the variable args.extract as a True or False value with the store_true option under action. If the user provides this argument then the variable will be true and the files matched will be extracted, if the user does not then the value will be false and the program will only hash the files it finds.

It's always good to show the user that the argument we received from them is doing something so let's add two lines to see if we have a search term and print it:

if not args.search == '.*':
  print "Search Term Provided",args.search

 Remember that .* is our default search term, so if the value stored in args.search is anything other than .* our program will print out the search term provided, otherwise it will just move on.

Our next changes all happen within our directoryRecurse function. First we need to capture the full path where the file we are looking at exists. We will do this by combining the partition number and the full path that lead to this file in order to make sure its unique between partitions.

outputPath ='./%s/%s/' % (str(partition.addr),'/'.join(parentPath))

Next we will go into the else if statement we wrote in the prior post to handle regular files with non zero lengths. We will add a new line of code to the beginning of the code block that gets executed to do our regular expression search as follows:

elif f_type == pytsk3.TSK_FS_META_TYPE_REG and entryObject.info.meta.size != 0: searchResult = re.match(args.search,entryObject.info.name.name)

You can see we are use the re library here and calling the match function it provides. We are providing two arguments to the match function. The first is the search term the user provided us and the second is the file name of the regular, non zero sized file we are inspecting. If the regular expression provided by the user is a match then a match object will be returned and stored in the searchResult variable, if there is not a match then the variable will contain no data. We write a conditional to test this result next:

if not searchResult:

This allows us to skip any file that did not match the search result provided. If the user did not specify a search term our default value of .* will kick in and everything will be a match.

Our last modification revolves around extracting out the files if our user selected the option to. The code looks like this

if args.extract == True:
if not os.path.exists(outputPath):
extractFile = open(outputPath+entryObject.info.name.name,'w')

The first thing we are doing is checking to see if the user has set the extract variable and caused it to be set to True. If they do then we will extract the file, if not we skip it. If it is true the first thing we do is make use of the os library's path.exists function. This will allow us to look at the output directory we want to create (and set in the outputPath variable above) already exists. If it does we can move on, if it does not than we call another os library provided function named makedirs. makedirs is nice because it will recursively create the path you specify so you don't have to loop through all the directories in between if they don't exist.

Now that our output path exists its time to extract it. We are modifying our old extractFile variable and now we are appending on our outputPath to the filename we want to create. This will place the file we are extracting in to the directory we have created. Next we write the data out to it as before and then close the handle since we will be reusing it.

If I was to run this program against the Level5 image we've been working with and specify both the extraction flag and provide a search term of .*jpg it would look like this

C:\Users\dave\Desktop>python dfirwizard-v9.py -i SSFCC-Level5.E01 -e -s .*jpg
Search Term Provided .*jpg
0 Primary Table (#0) 0s(0) 1
1 Unallocated 0s(0) 8064
2 NTFS (0x07) 8064s(4128768) 61759616
Directory: /
Directory: /$Extend/$RmMetadata/$Txf
Directory: /$Extend/$RmMetadata/$TxfLog
Directory: /$Extend/$RmMetadata
Directory: //$Extend
match  BeardsBeardsBeards.jpg
match  ILoveBeards.jpg
match  ItsRob.jpg
match  NiceShirtRob.jpg
match  OhRob.jpg
match  OnlyBeardsForMe.jpg
match  RobGoneWild.jpg
match  RobInRed.jpg
match  RobRepresenting.jpg
match  RobToGo.jpg
match  WhatchaWantRob.jpg
Directory: //$OrphanFiles
On my desktop there would be a folder named 2 and underneath that the full path to the file that matched the search term would exist.

That's it! Part 9 had a lot going on but now that we've built the base for our recursion it gets easier from here. The code for this part is located on the series Github here: https://github.com/dlcowen/dfirwizard/blob/master/dfirwizard-v9.py

Next in part 11 we will do the same thing from live systems but allow our code to enumerate all the physical disks present rather than hardcoding an option.