Thursday, February 19, 2015

Automating DFIR - How to series on programming libtsk with python Part 2

Hello Reader,
    In our last post, Part 1 - Accessing an image and printing the partition table, we printed out a partition table from a forensic image. Now you might have noticed that the image we are using for these first posts is a VHD and not an E01. VHD and other raw image formats are supported directly by pytsk so for our first couple of posts it will be easier to work with them, once we get beyond the basics I'll then bring pyewf and its corresponding libewf lbraries into our code examples and we will begin using all different types of image formats in our DFIR Wizardry even getting into shadow copy access and more! So stick with me if you want to go step by step.  If you don't want to go step by step and you have the programming experence to leap ahead then I would suggest jumping straight to the sample code that comes with the following projects:

pytsk sample code: https://github.com/py4n6/pytsk/tree/master/samples
libewf sample code: https://github.com/libyal/libewf/wiki/Development
dfvfs sample code:  https://github.com/log2timeline/dfvfs/tree/master/examples

You can also jump ahead or catch up with the rest of this series:

Part 3  - Extracting a file from a live system
Part 4 - Turning a python script into a windows executable
Part 5 - Auto escalating your python script to administrator
Part 6 - Accessing an E01 image and extracting files
Part 7 - Taking in command line options with argparse to specify an image
Part 8 - Hashing a file stored in a forensic image
Part 9 - Recursively hashing all the files in an image
Part 10 - Recursively searching for files and extracting them from an image
Part 11 - Recursively searching for files and extracting them from a live system 
Part 12 - Accessing different file systems
Part 13 - Accessing Volume Shadow Copies 





Now with that out of the way let's move on to the next step of our DFIR Wizard program. In the first post, we can call that DFIRW v1, we accessed an image and printed out the partition table. Now let's extend out example and extract a file from the image.

First let's remember where we left off, here is the code we last worked with:
#!/usr/bin/python
# Sample program or step 1 in becoming a DFIR Wizard!
# No license as this code is simple and free!
import sys
import pytsk3
imagefile = "Stage2.vhd"
imagehandle = pytsk3.Img_Info(imagefile)
partitionTable = pytsk3.Volume_Info(imagehandle)
for partition in partitionTable:
  print partition.addr, partition.desc, "%ss(%s)" % (partition.start, partition.start * 512), partition.len

Now to access a file stored in a forensic image we already have the main thing we need, a python libtsk object that provides functions to access the volume stored within the forensic image. Next we need an object that will give us access to the file system on a volume we choose. In the case of this example image there is only one valid file system and that is the second partition on the disk which is NTFS.

If you remember from the previous post our NTFS partition info looked like this:

2 NTFS (0x07) 128s(65536) 1042432

This becomes important in this next step because we need to tell libtsk where our file system is that we want to open. To do that we need  to call a new function called FS_Info which takes two important pieces of information to work, the name of the variable that is storing our image object we made already and the offset to where our file system begins on the partition we want to examine. If you remember from the last post we said that the value 65536 was the absolute offset to the beginning of the NTFS partition so we already have the information we need! Let's add that on to our program.

#!/usr/bin/python
# Sample program or step 2 in becoming a DFIR Wizard!
# No license as this code is simple and free!
import sys
import pytsk3
imagefile = "Stage2.vhd"
imagehandle = pytsk3.Img_Info(imagefile)
partitionTable = pytsk3.Volume_Info(imagehandle)
for partition in partitionTable:
  print partition.addr, partition.desc, "%ss(%s)" % (partition.start, partition.start * 512), partition.len
filesystemObject = pytsk3.FS_Info(imagehandle, offset=65536)

You can see we've added one new line to the bottom of our program, we've made a new variable called filesystemObject which because we passed in the offset to our NTFS partition now gives us access to the underlying filesystem contained within t! That's great you say, but how do we actually access a file? Well in later examples I'll show how to recurse through a file system to search, find, hash and all sorts of other good things but to begin with let's just grab something. One of the files you can always expect on a NTFS drive is the master file table which goes by the name of $MFT. So let's grab that!

#!/usr/bin/python
# Sample program or step 2 in becoming a DFIR Wizard!
# No license as this code is simple and free!
import sys
import pytsk3
imagefile = "Stage2.vhd"
imagehandle = pytsk3.Img_Info(imagefile)
partitionTable = pytsk3.Volume_Info(imagehandle)
for partition in partitionTable:
  print partition.addr, partition.desc, "%ss(%s)" % (partition.start, partition.start * 512), partition.len
filesystemObject = pytsk3.FS_Info(imagehandle, offset=65536)
fileobject = filesystemObject.open("/$MFT")

Great now we have a new variable called fileobject which contains our access to all the things libtsk can tell us about the file $MFT located at the root of file system. If you want to play with this program later you can change /$MFT to the full path of any other file you want. For now though let's focus on the $MFT which is a useful file in its own right and many times we want to extract it and parse it with external parses to get at some of the more obscure metadata. 

Let's start by gathering some information about the $MFT file like:
  • What is its inode number
    • libtsk has a metadata structure we can use to provide this. It's stored in the info.meta.addr value which in our code we would fully reference it as fileobject.info.meta.addr
  • What is the file name, in case in the future we are accessing files by inode
    • libtsk has a separate structure just for file names than it does metadata. While NTFS combines the storage of these two structures into one location (the MFT) many other file systems don't instead they store the file name in the directory that links to the file. The filename is stored in info.name.name and we would fully reference it as fileobject.info.name.name
  • What is the creation time
    • Creation time and all the other time stamps of a file are always important to us. The value that libtsk returns to us is the time in epoch (Stored UTC). The creation timestamp is stored in info.meta.crtime and we would fully reference it is fileobject.info.meta.crtime
For a full list of what metadata properties you can access go here: http://www.sleuthkit.org/sleuthkit/docs/api-docs/structTSK__FS__META.html

So let's add in some code to print out all this useful information about $MFT in our image:
#!/usr/bin/python
# Sample program or step 2 in becoming a DFIR Wizard!
# No license as this code is simple and free!
import sys
import pytsk3
imagefile = "Stage2.vhd"
imagehandle = pytsk3.Img_Info(imagefile)
partitionTable = pytsk3.Volume_Info(imagehandle)
for partition in partitionTable:
  print partition.addr, partition.desc, "%ss(%s)" % (partition.start, partition.start * 512), partition.len
filesystemObject = pytsk3.FS_Info(imagehandle, offset=65536)
fileobject = filesystemObject.open("/$MFT")
print "File Inode:",fileobject.info.meta.addr
print "File Name:",fileobject.info.name.name
print "File Creation Time:", fileobject.info.meta.crtime

 Awesome! Now we can access a file stored in an image and print out information about it! However, you'll notice if you run this example that the creation time printed isn't what you expect. The timestamp value being returned here is in epoch form, meaning the number of seconds that have passed since midnight 1/1/1970 UTC. So we need some help from the standard python libraries in getting this epoch timestamp into a human readable timestamp. The python standard library datetime will do just that for us! Inside of the datetime library is a function called 'fromtimestamp' which when combined with 'strftime' will allow us to convert our epoch value into a human readable timestamp of our liking! That's right you can make the timestamp show up in any format you want to match american, european and other database specific timestamp formats.

To add in the datetime library we need to add a new import statement near the beginning of our program, import datetime, I put it in bold in the program below. Then we need to use the two functions we talked about below which we reference with the library name first (datetime) and then the function name in the library (datetime.datetime) followed by the function we want to call (fromtimestamp) and then how we want the timestamp printed (strftime). All combined you get the program as you see below:

#!/usr/bin/python
# Sample program or step 2 in becoming a DFIR Wizard!
# No license as this code is simple and free!
import sys
import pytsk3
import datetime
imagefile = "Stage2.vhd"
imagehandle = pytsk3.Img_Info(imagefile)
partitionTable = pytsk3.Volume_Info(imagehandle)
for partition in partitionTable:
  print partition.addr, partition.desc, "%ss(%s)" % (partition.start, partition.start * 512), partition.len
filesystemObject = pytsk3.FS_Info(imagehandle, offset=65536)
fileobject = filesystemObject.open("/$MFT")
print "File Inode:",fileobject.info.meta.addr
print "File Name:",fileobject.info.name.name
print "File Creation Time:",datetime.datetime.fromtimestamp(fileobject.info.meta.crtime).strftime('%Y-%m-%d %H:%M:%S')

So to break this down further, instead of just printing the epoch value we are now printing the human readable value of the creation timestamp. We are doing this conversion with the datetime library by calling it as datetime.datetime.fromtimestamp. We are passing fromtimestamp the full reference to our selected file's creation timestamp (fileobject.info.meta.crtime) and then we are appending onto this a string formatting command. strftime or string format time is allowing us to control how the timestamp will be printed. Here we are passing %Y for the full four digit year, %m for the two digit month, %d for the two digit date and then the 24 hour time version of the time with %H for hour %M for minute and %S for seconds.  You can change the ordering anyway you want to make the timestamp format fit your needs.

For the full list of timestamp formatting codes go here: http://strftime.org/

Ok now for the next part which many of you have been waiting for, how do I get a file out of this image! Would you believe me if I said all we need is three more lines of code? Well you should!

#!/usr/bin/python
# Sample program or step 2 in becoming a DFIR Wizard!
# No license as this code is simple and free!
import sys
import pytsk3
import datetime
imagefile = "Stage2.vhd"
imagehandle = pytsk3.Img_Info(imagefile)
partitionTable = pytsk3.Volume_Info(imagehandle)
for partition in partitionTable:
  print partition.addr, partition.desc, "%ss(%s)" % (partition.start, partition.start * 512), partition.len
filesystemObject = pytsk3.FS_Info(imagehandle, offset=65536)
fileobject = filesystemObject.open("/$MFT")
print "File Inode:",fileobject.info.meta.addr
print "File Name:",fileobject.info.name.name
print "File Creation Time:",datetime.datetime.fromtimestamp(fileobject.info.meta.crtime).strftime('%Y-%m-%d %H:%M:%S')
outfile = open('DFIRWizard-output', 'w')
filedata = fileobject.read_random(0,fileobject.info.meta.size)
outfile.write(filedata)

We are opening a file for writing, I called the file we are writing to DFIRWizard-output, using the open function and letting python know we want to write to this file using the 'w' flag. We are storing the file handle for writing to this file in the variable outfile. The final line then looks like outfile = open('DFIRWizard-output', 'w')

To read the contents of the file we use function read_random which takes two parameters; the offset from the start of the file where we want to start reading and how many bytes of data we want to read. We are then reading in the contents of the $MFT from the beginning (0) to the end (fileobject.info.meta.size is the size of the file in bytes) and storing the data read into a variable called filedata. Now when you are working with large files or lots of files this isn't the best way to read the data. You'll likely want to buffer it and do reads and writes in a loop, but to keep it simple we are making it one line. The final line then looks like filedata = fileobject.read_random(0,fileobject.info.meta.size)

Last we are writing the data we just read into filedata into an output file 'DFIRWizard-output' using the write method that is available to all file objects. We do this by calling the file object (outfile) with the method write and passing the write method the variable we want to write to the file (filedata). So when is all said and done it looks like outfile.write(filedata)

That's it! Our second version of DFIR Wizard is done! You can try this yourself or download my version from the series Github at:https://github.com/dlcowen/dfirwizard/blob/master/dfirwizard-v2.py

In the third part of this series we will show how to do the same thing against a live system!