Thursday, February 19, 2015

Automating DFIR - How to series on programming libtsk with python Part 1

Hello Reader,
              As you can see from the title of this post I'm starting on a series all about automating your work flow when doing DFIR work. It is my belief that our industry as we know it is poised for change due to the work of a few, but mostly in my opinion Joachim Metz. For all the time that I've done DFIR work the biggest lock in that commercial software had that everyone else did not was the ability to work directly against a forensic image. We would always have to resort to using some commercial tool (whether free, semi free or paid for) to get access to the underlying data within a forensic image or a live running system to get at the data we wanted to. With the large set of free and open source libraries now available you can write simple code to automate most of the work you were doing within these forensic tools and have the ability to customize that to your actual need.

This in my opinion is a huge shift in how we can do our work, We can build tools that do not require us to load up our dongles, handle how someone else imagines your workflow and export data just so we can get our favorite one off tool to work with a specific artifact. This is becoming more of an issue as the community itself is finding new artifacts faster than the commercial entities can keep up with them so even if you wanted to stay in the commercial suite your purchased, you are still forced to get out of their box to get a new artifact that could make your case.

In this series of posts I want to show how we can build up from a small python program that will access an image to a full blown forensic utility that will pull all the data from an image and create its output all in a way that we can change and customize to our liking. Now you may be saying, hey there are already examples of how to do this in the projects that made these libraries! You are right there are, but I am hoping to simplify this to those of you new to programming so we can expand the scope of people out there that can use these libraries and push our industry forward.

For those of you who are serious programmers be warned, I am not going to follow good coding practices in this post as I am trying to get people motivated to try programming. 

This series has progressed quite a ways and you can continue to follow along here:

Part 2 - Extracting a file from an image
Part 3  - Extracting a file from a live system
Part 4 - Turning a python script into a windows executable
Part 5 - Auto escalating your python script to administrator
Part 6 - Accessing an E01 image and extracting files
Part 7 - Taking in command line options with argparse to specify an image
Part 8 - Hashing a file stored in a forensic image
Part 9 - Recursively hashing all the files in an image
Part 10 - Recursively searching for files and extracting them from an image
Part 11 - Recursively searching for files and extracting them from a live system 


What you will need to follow along:

  • An image, I am going to work on this one from a prior challenge to start with: https://mega.co.nz/#!ywxEgQZZ!RawTMjJoR6mJgn4P0sQAdzU5XOedR6ianFRcY_xxvwY it's a small VHD that we can start with. In later posts I will be using and providing additional images. You will need to unzip the image and then place it in the same directory where you are going to create your code examples for ease of use.
  • A text editor, I am using ActiveState Komodo as my development environment. That is just because I am familiar with it though. You can just use vi or notepad. 

Getting started, access an image

Ok you have all the items listed above so you are ready! We are going to make a new file called dfirwizard.py.

The first thing we have to do is give the file the normal header and then import the libraries we want to work with. 

#!/usr/bin/python
# Sample program or step 1 in becoming a DFIR Wizard!
# No license as this code is simple and free!
import sys
import pytsk3
So here we importing the standard python system library and pytsk3 which will give us all the forensic image access goodness we want. Next we need to tell it where the image is located and then have pytsk open it up for us. I am going to keep this very simple to prevent confusion/errors and hard code the path, as we build up this program we will switch that out so you can pass the name of the image in each time you run your super dfir wizard programs.

#!/usr/bin/python
# Sample program or step 1 in becoming a DFIR Wizard!
# No license as this code is simple and free!
import sys
import pytsk3
imagefile = "Stage2.vhd"
imagehandle = pytsk3.Img_Info(imagefile)

So we have hardcoded the name of the image in a variable named 'imagefile'.  Next we created a python tsk object using the function Image_Info that is built into python tsk. that will allow us to work with this forensic image. The resulting object is now stored in the variable 'imagehandle' which will then be used to access the underlying image.

Now this returns nothing even though it worked so lets add one more simple line that shows us that we do in fact have access to the underlying image. Let's tell pytsk to give us the partition table for this image, to do this we will need another built in python tsk library function named Volume_Info. We give Volume_Info the image object we made in the previous example and it returns to us the partition information contained within it, assuming there is a partition table available.

#!/usr/bin/python
# Sample program or step 1 in becoming a DFIR Wizard!
# No license as this code is simple and free!
import sys
import pytsk3
imagefile = "Stage2.vhd"
imagehandle = pytsk3.Img_Info(imagefile)
partitionTable = pytsk3.Volume_Info(imagehandle)

Great now we just need to do something with this result set that has been provided to us that is currently stored in the variable 'partitionTable'. The most obvious thing to do is print it out one partition at a time. I'm using the for loop contained in the pytsk example mmls.py to show this:

#!/usr/bin/python
# Sample program or step 1 in becoming a DFIR Wizard!
# No license as this code is simple and free!
import sys
import pytsk3
imagefile = "Stage2.vhd"
imagehandle = pytsk3.Img_Info(imagefile)
partitionTable = pytsk3.Volume_Info(imagehandle)
for partition in partitionTable:
  print partition.addr, partition.desc, "%ss(%s)" % (partition.start, partition.start * 512), partition.len

So now we are printing out the partition table back to the window where the command prompt you are running this python script exists. The variable PartitionTable contains a linked list of partition entries that we are iterating through using a for loop in this example above. So for every partition in the linked list we are printing out specific attributes of the partition. For a full list of all attributes returned in this linked list go to the following api reference page from libtsk:
http://www.sleuthkit.org/sleuthkit/docs/api-docs/structTSK__VS__PART__INFO.html


 When you run our python script, and become a DFIR Wizard!, you will see the following output:
F:\downloads\SSFFC-Level2>python dfirwizard.py
0 Primary Table (#0) 0s(0) 1
1 Unallocated 0s(0) 128
2 NTFS (0x07) 128s(65536) 1042432
3 Unallocated 1042560s(533790720) 6017

So what do we see here? Let's break down the NTFS line.

2 NTFS (0x07) 128s(65536) 1042432


  • 2 - Represents the partition number
  • NTFS (0x07)- Reprents the partition description including the type flag for NTFS
  • 128s - Represnts the starting sector of the partition
  • (65536) - Represents the offset by multiplying the sector number  where the partition starts by 512 bytes to calculate the absolute position within the image where the partition begins.
  • 1042432 - Represents the length in sectors that makes up this partition. If you where to again multiply this number by 512 you would get 533,725,184 which is 509 MegaBytes (divide 533,725,184 by 1024 once to get kilobytes, twice to get megabytes) and is the size of the partition found within the image. 


So there you have it. You have just accessed a forensic image and printed out the partition table from it with 10 lines of code. This wasn't possible to do so easily prior to pytsk which is providing a simplified interface to libtsk. If you want to download the code rather than risk python errors of indentation in copy and pasting it you can download this sample file here:
https://mega.co.nz/#!j9BUnaZA!uA9WaR6hB3cLcQE7xgGjGnKcCODAHpRPim8O40AWDMw

Familiar with git? I made a github for this series you can access here:
https://github.com/dlcowen/dfirwizard/tree/master


In the next post let's actually grab a file out an image and show it to you and then just keep building on from there!

Do you have a specific thing you want to see built or shown in these posts? Leave a comment below and I'll make sure to include it.