Saturday, February 21, 2015

Automating DFIR - How to series on programming libtsk with python Part 3

Hello Reader,
      Before you read any farther make sure you have read part 1 and 2 as we are not going back over what we've already done.

Part 1 - Accessing an image and printing the partition table
Part 2 - Extracting a file from an image

Following this post the series continues:

Part 4 - Turning a python script into a windows executable
Part 5 - Auto escalating your python script to administrator
Part 6 - Accessing an E01 image and extracting files
Part 7 - Taking in command line options with argparse to specify an image
Part 8 - Hashing a file stored in a forensic image
Part 9 - Recursively hashing all the files in an image
Part 10 - Recursively searching for files and extracting them from an image
Part 11 - Recursively searching for files and extracting them from a live system 
Part 12 - Accessing different file systems
Part 13 - Accessing Volume Shadow Copies  

In the last post in this series we extracted a file from a forensic image, which is something you can start using to extract all sorts of data without having to load another tool. We are going to take a sideways step away from forensic images for one post and show how versatile this library is.

So here is a question to make you think, what is the difference between a raw forensic image (some call this a dd image) of a hard drive and a live hard drive itself?

The answer, the live hard drive is changing but otherwise all of the structures are the same.

This means the same library (pytsk) we are using to access a forensic image can be used to access the hard drive of a live running system! If you are investigating a system, for whatever reason, and you want to:

A. Get access to locked files
B. Access files the operating system won't show you
C. Extract data without changing the metadata
D. Carve a live system
E. Write your own imaging program
F. Anything else you can dream up!

Then this is the way to do it! Many commercial vendors have offered solutions for doing this as 'triage' products and for the most part we can replicate all of their collection efforts using free and open source libraries.

Do I have your interest? Then let's go!

Let's start with our first program from part 1 and change one thing which is marked in bold below.

#!/usr/bin/python
# Sample program or step 1 in becoming a DFIR Wizard!
# No license as this code is simple and free!
import sys
import pytsk3
imagefile = "\\\\.\\PhysicalDrive0"
imagehandle = pytsk3.Img_Info(imagefile)
partitionTable = pytsk3.Volume_Info(imagehandle)
for partition in partitionTable:
print partition.addr, partition.desc, "%ss(%s)" % (partition.start, partition.start * 512), partition.len
Note: For this to run you must execute it as administrator/root on your system as you are accessing a raw physical disk. To do this in windows right click cmd.exe and say run as administrator. To do this in Linux or OSX make sure to run your script with sudo. 

If you were to run this on your own system right now it would print the partition table of the first drive in sequence. This code is interesting as the path I've given to a physical disk is in the windows style for accessing raw devices, for Linux or OSX make sure to do a fdisk -l to find the path to the physical disks on your system.

Now let's look at that code again and let this sink in. We didn't change any code to make the same program work on a live system then we did on a forensic image. The only thing we changed was the name of the file it was going to open and access like a forensic image and all the rest of our code worked! 

That means that we can write one program that can operate on both live systems and forensic images which is pretty awesome if you stop and think about it. 

Now let's move forward from our sidestep and extend out our previous file extraction example to iterate through all the partitions on a live system and extract the $MFT from it. For this example we are looking for $MFT files so we only want to extract data from NTFS partitions on our live systems. To do this we need to add a line code right after we print our partition table to check to see if the partition description displayed to us contains the word NTFS. We can do this with an 'if' statement and the operator 'in'. 

if 'NTFS' in partition.desc:
So we are testing if our variable 'partition.desc' contains the word 'NTFS'. If it does than the next part of the code will execute, if it does not then it will execute any other conditionals relating to the if (else if and else) and then move on. Since for this example we are only testing for NTFS our code will just loop to the next partition if it is not NTFS. 

The next thing we need to do is to take out the hardcoded offset to the beginning of our partition that is in the FS_Info method we called in the previous example from part 2. We need to replace the hard coded offset we used before with the offset of whatever partition we just checked for containing NTFS. So instead of

filesystemObject = pytsk3.FS_Info(imagehandle, offset=65536)

we are going to replace 65536 with the same variable we are printing out in the partition table, partition.start, which gives us the number of sectors into the disk we are looking at where this partition begins. Then we need to take that from sectors to absolute offset by multiplying the number of sectors by the sector size. I am going to assume here that you have a sector size of 512, if you don't change 512 to whatever your sector size is, so I am going to multiply the number contained in partition.start by 512 to get the absolute offset where this partition begins. Lastly I need to make sure that the order of operations happens correctly so I will wrap this in a () to note that the commands within it should be evaluated before proceeding. The end result looks like this:

filesystemObject = pytsk3.FS_Info(imagehandle, offset=(partition.start*512))
Great! Now we are iterating through and accessing all the NTFS partitions on our live system! But wait, if we are doing this for multiple partitions we will be overwriting the contents of each prior partition with the next file when we write out the data. The next change we have to make is changing the name of our output file to identify the partition number the file came from and the name of file we are extracting. To do this I am first making a new variable called outFileName which will contain the filename of the file we are writing out to. In order to combine the partition number and the filename together I am going to use the '+' operator which will combine two strings together and return the combination. There is only one small thing left to fix, partition.addr which stores our partition number is not stored as a string it is stored as an integer (a number). So in order to append it to a string using the  '+' operator I need to tell python to treat it or 'cast it' as a string using the str() function. The end result looks like this: 

outFileName = str(partition.addr)+fileobject.info.name.name
I now change our open command that opens our output file for writing to use this variable instead of the hard coded file name we had before:

outfile = open(outFileName, 'w')

and lastly since we are going to be reusing this file handle it would be wise to close it 

outfile.close
and that's it! 

We now have a program that will iterate though all the partitions on the 1st disk in your live system and extract the $MFTs from every NTFS partition to uniquely named files! The finished code looks like this:

#!/usr/bin/python
# Sample program or step 3 in becoming a DFIR Wizard!
# No license as this code is simple and free!
import sys
import pytsk3
import datetime
imagefile = "\\\\.\\PhysicalDrive0"
imagehandle = pytsk3.Img_Info(imagefile)
partitionTable = pytsk3.Volume_Info(imagehandle)
for partition in partitionTable:
  print partition.addr, partition.desc, "%ss(%s)" % (partition.start, partition.start * 512), partition.len
  if 'NTFS' in partition.desc:
    filesystemObject = pytsk3.FS_Info(imagehandle, offset=(partition.start*512))
    fileobject = filesystemObject.open("/$MFT")
    print "File Inode:",fileobject.info.meta.addr
    print "File Name:",fileobject.info.name.name
    print "File Creation Time:",datetime.datetime.fromtimestamp(fileobject.info.meta.crtime).strftime('%Y-%m-%d %H:%M:%S')
    outFileName = str(partition.addr)+fileobject.info.name.name
    print outFileName
    outfile = open(outFileName, 'w')
    filedata = fileobject.read_random(0,fileobject.info.meta.size)
    outfile.write(filedata)
    outfile.close

You can download this post's code at the series Github here: https://github.com/dlcowen/dfirwizard/blob/master/dfirwizard-v3.py

If you wanted to do this against a forensic image just change the path stored in the variable imagefile back to the forensic image you want it to extract from and all the rest of the code remains the same. In the next part in this series we will make it so you can specify where to extract from and how to turn your python script into a standalone executable so you don't have to have python on the system you want to run your program on.