Automating DFIR - How to series on programming libtsk with python Part 8

Automating DFIR - How to series on programming libtsk with python Part 8

Hello Reader,
            Welcome to part 8 of the Automating DFIR series, if this is the post your starting with... Stop! You need to read all the prior parts or you will be really, really lost. There is a lot going on here and the better you understand it the easier it will all make sense to you, allowing you to done your wizard robe and hat of DFIR Wizardry! Catch up on the part you left on below:

Part 1 - Accessing an image and printing the partition table
Part 2 - Extracting a file from an image
Part 3  - Extracting a file from a live system
Part 4 - Turning a python script into a windows executable
Part 5 - Auto escalating your python script to administrator
Part 6 - Accessing an E01 image and extracting files
Part 7 - Taking in command line options with argparse to specify an image

Following this post the series continues:

Part 9 - Recursively hashing all the files in an image
Part 10 - Recursively searching for files and extracting them from an image
Part 11 - Recursively searching for files and extracting them from a live system 
Part 12 - Accessing different file systems
Part 13 - Accessing Volume Shadow Copies  

Now that we have that out of the way for the new people, let's get back into the good stuff. One of the most common things we need to do, other than pulling a file out of an image, is to hash a file or files contained within an image. Let's start simply by just adding the hashing library that Python provides and generating some hashes for our $MFTs. Then we'll work on making our script a bit more useful with some more command line options and much better intelligence.

So to start with, as you may be expecting by now, we need to import a library. In Python 2.7 the library to import for hashing, that comes with Python, is called hashlib. So the first thing we need to do is import is as follows:

import hashlib

You can read the documentation on hashlib here:

Next we need to create a hashlib object to generate, store and print our hashes. Hashlib supports many different types of hashing and as of this post it supports md5, sha1, sha224, sha256, sha384, and sha512. When we create our hashlib option we do so while also choosing which of the hashing algorithms our object will use. For each hash algorithm you want to use you'll need a separate object. Let's start with MD5 by making the object and storing it in a variable called md5hash:

md5hash = hashlib.md5()

If we wanted to make a sha1 hashlib object we would just change the function at the end we are calling to sha1() and store it in a new variable, as follows:

sha1hash = hashlib.sha1()

If want to then generate a hash we would call the update function built into our hashlib object. When you call the update function you need to give it something to hash. You can give it a string or a variable that contains data to be hashed. In our program I am giving the object the contents of the $MFT we read from the image whose data is being stored in the variable filedata. So the whole call looks like this:


Similarly the call to generate the sha1 hash looks like this:


We've now generated md5 and sha1 hashes for the $MFT we've extracted from the image, now we need to get the hashes printed so our user can see them.To do this we use the hexdigist() method built into our hashlib object. The hexdigest function takes no arguments, it just prints in hex whatever hash value we last set with update. In this version of DFIR Wizard! we are just going to print out the hash value to our command prompt window. It looks like this:

print "MD5 Hash",md5hash.hexdigest()
print "SHA1 Hash",sha1hash.hexdigest()

Taken all together the program looks like this
# Sample program or step 6 in becoming a DFIR Wizard!
# No license as this code is simple and free!
import sys
import pytsk3
import datetime
import pyewf
import argparse
import hashlib     
class ewf_Img_Info(pytsk3.Img_Info):
  def __init__(self, ewf_handle):
    self._ewf_handle = ewf_handle
    super(ewf_Img_Info, self).__init__(
        url="", type=pytsk3.TSK_IMG_TYPE_EXTERNAL)
  def close(self):
  def read(self, offset, size):
  def get_size(self):
    return self._ewf_handle.get_media_size()
argparser = argparse.ArgumentParser(description='Extract the $MFT from all of the NTFS partitions of an E01')
        '-i', '--image',
        help='E01 to extract from'
args = argparser.parse_args()
filenames = pyewf.glob(args.imagefile)
ewf_handle = pyewf.handle()
imagehandle = ewf_Img_Info(ewf_handle)
partitionTable = pytsk3.Volume_Info(imagehandle)
for partition in partitionTable:
  print partition.addr, partition.desc, "%ss(%s)" % (partition.start, partition.start * 512), partition.len
  if 'NTFS' in partition.desc:
    filesystemObject = pytsk3.FS_Info(imagehandle, offset=(partition.start*512))
    fileobject ="/$MFT")
    print "File Inode:",
    print "File Name:",
    print "File Creation Time:",datetime.datetime.fromtimestamp('%Y-%m-%d %H:%M:%S')
    outFileName = str(partition.addr)
    print outFileName
    outfile = open(outFileName, 'w')
    filedata = fileobject.read_random(0,
    md5hash = hashlib.md5()
    md5hash.update(filedata)    print "MD5 Hash",md5hash.hexdigest()    sha1hash = hashlib.sha1()    sha1hash.update(filedata)    print "SHA1 Hash",sha1hash.hexdigest()    outfile.write(filedata)

Pretty easy right? When I run it I get the following:
C:\Users\dave\Desktop>python -i SSFCC-Level5.E01
0 Primary Table (#0) 0s(0) 1
1 Unallocated 0s(0) 8064
2 NTFS (0x07) 8064s(4128768) 61759616
File Inode: 0
File Name: $MFT
File Creation Time: 2014-09-12 12:20:52
MD5 Hash d91df0fb48c36f77a7a9c65870761beb
SHA1 Hash 26213c341902111a68464020965e5fb50108730c

Now the neat thing about the update method build into hashlib is that it will just keep adding to the hash the data you pass in. So if you wanted to buffer your reads, which we will do in a future part in the series, to prevent too much memory usage you can just pass the chunks to update as you read them and still get a hash for the complete file. 

You can get the source of this version of DFIR Wizard on the series github here:

In the next part we are going to go into how to recurse through an entire image and hash all the files within it. Building up and bigger as our DFIR Wizard program continues to grow.

Post a Comment