Friday, December 11, 2015

How to install dfVFS on Windows without compiling

Hello Reader,
After some testing James Alwood in our office was able to get dfVFS installed using just MSI packages from the log2timeline project. This is a big step for us as it severly lowers the bar of who can take advantage of all the advanced capabilities that dfVFS provides versus modules like pytsk on their own. In this first post in the dfVFS series James has written a walk through on how to do install it and make sure it works. 

Digital Forensics Virtual File System (dfVFS) is a flexible file system that allows its users to access many different kinds of image formats using Python.  Getting it up and running can be difficult and confusing.  This guide will help you install everything you need to get started using dfVFS on Windows.  I am assuming you already have 32 bit Python 2.7.9 or .10 installed.


dfVFS has a lot of dependencies, fortunately most of them packaged together and made available for log2timeline.

2) Click on the “Download ZIP” button on the right side of the screen
3) Unzip the file
4) Go to the win32 folder
5) Install the msi’s for the following packages into your Python installation (name is before version number):

6) Open a Python session in IDLE or from a command prompt
7) Enter
      >>>import sqlite3
8) If the output is below version 3.7.8 (fresh Python installs seem to have 3.6.21) go to
9) Download and unzip the file
10) In C:\Python27\DLLs back up the sqlite3.dll and then copy the new sqlite3.dll you just downloaded to this folder
11) If you haven’t already, end your Python session then restart it.  Run the lines from Step 7 again.  You should now be on a more current version of sqlite3
12) All of your dependencies should now be installed

Installing dfVFS

With all the dependencies installed we can now get the dfVFS module installed.

1) Go back into the win32 folder in the log2timelines archive from step 4 in part 1
2) Run the dfvfs msi located inside the win32 folder
3) When it is complete dfVFS will be added to your Python installation and ready for use
4) If you would like to test the installation to verify it was successful download the dfVFS zip available on Github:
5) Unzip the archive.  DO NOT use the extract option included in Windows, it makes minor modifications to some of the test files while unpacking them and will cause tests to fail.
6) Open a command prompt and cd into the directory where you unpacked the dfVFS archive
7) Run python from this directory to verify that dfVFS was properly installed

8) If you rename or delete the dfvfs folder in your unzipped dfVFS archive and rerun the tester will run dfVFS from the installed package in your Python directory

Monday, November 2, 2015

Forensic Lunch Podcast Version

Hello Reader,
          Did you know we have a podcast version of the forensic lunch?

Will we do!

You can directly subscribe to it here:

or you can get on iTunes, Doubletwist and other fine podcast directories.

All the past and current episodes are up there and I try to get new episodes converted from Youtube videos to MP3 within a couple days of the broadcast or sooner.

If you want to watch the videocast (I think that's the right word) they are all on Youtube here:

Lastly for this post the next episode of the Forensic Lunch is 11/13/15 with:

  • Andrew Case,@attrc, from the Volatility Project talking about Volatility 2.5, new plugins and the winners of this years Volatility Plugin Contest
  • Yogesh Kahtri, from Champlain, talking about SRUM forensics in Windows 8.1+. A truly amazing new artifact 
  • Matt and I talking about our new open source tool Elastic Handler

You can RSVP to watch it live here:

You can watch it on Youtube here:

Expect more posts in this space through the end of the year!

Friday, October 30, 2015

Presenting ElasticHandler - OSDFCon 2015 - Part 1 of 3

Hello Reader,
         It's been too long, there are times I miss the daily blogging where I made myself do this daily. Other times I'm just ready to go to sleep and happy to not have the obligation on me. But 5 months without sharing what we are doing is far to long.

Today let me introduce you to what I hope will be a new tool in your arsenal, we call it Elastic Handler. Why? Well it's well known that we are bad at naming things at G-C Partners. We are bad at making cool names involving animals or clever acronyms. To make up for this we work really hard to make useful things.  I think you'll be pleasantly surprised to see how much this could help your day to day DFIR life.

What you will need:

1. Python installed, hopefully if you followed the blog at all this year you have that now
2. Our code from Github:
3.  Elastic Search.

If you don't already have Elastic Search (ES) running or you don't want to dedicate the time to do so, then you and I share a lot in common! When I was first testing this with Matt I didn't want to use the server we had setup internally as that wouldn't be realistic for those just trying to do this for the first time. That's when I remembered this amazing website that has packaged up services like ES and turned them into pre-configured and deployable instances. This website is called BitNami.

So let's say after reading this post you realize you want to use this neat new tool we've put out (it's open source btw!). Then you can just run the ES installer from Bitnami found here: and available for Windows, OSX and Linux!

Once you run through their install wizard everything is configured and ready to go! It really is that simple and it just keeps taking me by surprise how easy it is each time I use one of their packages.

What does it do?

You have what you need to use our new new new thing, but what does it do? Elastic Handler allows you to take all the tools you currently run and make their output better by doing the following:

1. It allows you to define a mapping from the CSV/TSV/Etc.. output you are getting now into JSON to that ES can ingest it
2. It allows you to normalize all of the data you are feeding in so that your column names are suddenly the same allowing cross reporting searching and correlation.
3. It lets you harness the extreme speed of Elastic Search to do the heavy listing in your correlation
4. It lets you take a unified view of your data/report to automate that analysis and create spreadsheets/etc.. that you would have spent a day on previously

The idea here is to let computers to what can be automated so you can spend more time using what makes humans special. What I mean by that is your analyst brains and experience to spot patterns, trends and meaning from the output of the reports.

In the GitHub repository we put out there are several mappings provided for the tools we call out to the most like; Tzworks tools, Shellbag explorer, our link parser, Mandiant's Shimcache parser, etc.. But the cool thing about this framework is that to bring in another report all you have to do is generate two text files to define your mapping, there is no code involved.

How does it work?

To run ElasticHandler you have two choices.

Option 1 - One report at a time
" --host --index --config --report

Option 2 - Batching reports
Included in the github repository is a batchfile that just shows how to run this against multiple reports at one time. The batch file is called 'index_reports.bat' you just need to add a line for each report you want to bring in.

Let's go into detail on what each option does.

--host: This is the IP Address/Hostname of the ES server. If you installed it on your own system you can just reference or localhost

--index This is the name of the index in Elastic Search where all of this batch of reports data will be stored. If this index does not exist yet, thats ok! ES will just make it for you and store the data there as long as that index is specified.

Think of index's as case's in FTK or Encase. All of the reports related to one particular investigation should go into one 'index' or 'case'. So if you had reports for one person's image you could make an index called 'bob' and then if you later bring in a new case involving max you would then specify 'max' as the index or all your new reports.

Index's here are just how you are going to logically store and group together the reports into your ES index. In the end they are all stored on the ES server and you can always specify which one to insert data into or query data from.

--config This is where the magic happens.The config file defines how the columns in your report output will be mapped into Json data for Elastic Search to store. Elastic Searches one big requirement is that all data that will be ingested has to be in JSON format. The good news is that you all you have to do is create a text file that defines these columns and ES will take it without having to pre-create a table or file within it to properly store the data. ES will store just about anything, its up to you to let it know what its dealing with so you can search it properly.

This leads to the mapping file which the config file references that will tell ES how to treat each column. Is it text a date, etc... I'll write up more on how to create this config files in the next post.

-- report This is tool output you want to bring into ES for future correlation and searching. Remember that you have to have a mapping defined in the config file above for the tool before you can successfully turn that CSV/TSV/Etc.. into something ES can take

If you just want to test out ES Handler we have sample reports ready to parse and config files for them ready to go in the GitHub.

How does it make my DFIR life better?

Once you have data ingested you can view it through ES's web interface (not the greatest) or Kibana (a visualization tool and good for timeline filtering) . If you really want to take full advantage of what you've just done (normalize, index and store) you need some program logic to correlate all that data into something new!

I've written one such automagic correlation for mapping out what usb drives have been plugged in and whats on each of the volumes plugged in via lnk files and jump lists into a xlsx file. You can find the script in our GitHub repo under scripts and its called I've actually fully commented this code so you can follow along and read it but I'll also be writing a post explaining how to create your own.

I do plan to write more of these correlation examples but let me show what the output looks like based on the sample data we provide in the GitHub repo:

First tab the USB devices found in the image from TZWorks USP tool:

The other tabs then contain what we know is on each volume based on the volume serial number recorded within the lnk files and jump lists:

And all of this takes about 1 second to run and create an xslx file with freeze pane applied to the first two rows for scrolling, filters applied to all the columns for easy viewing and more!

You can make the report output fancier and if you wanted to do so I would recommend looking at Ryan Benson's code for writing out xlsx files using xlsxwriter found in HindSight: 
which is where I found the best example of doing it.

What's next?

Try out Elastic Handler, see if you like what it does and look to see what tools we are missing mappings for. You can ingest anything you can get a report output for and you can correlate anything you can find within it. If you want more background you can see our slides from OSDFCon here:

and watch the blog as I supplement this with two more posts regarding how to make your own mapping files and how to create your own correlation scripts. 

Thursday, October 29, 2015

Wait it's October?

Hello Reader,
       It's been 5 months since I last updated the blog and I have so much to share with you! I'm writing this to force myself to follow up with some posts tomorrow that will include:

Our slides from OSDFCon
An overview of our new open source project we put on GitHub ElasticHandler
An overview of our other new open source project on GitHub GCLinkParser
All the video's of the forensic lunch, they've been on youtube though!
An update on the forensic lunch podcast version
and more!

Talk to you tomorrow!

Monday, May 25, 2015

Automating DFIR - How to series on programming libtsk with python Part 13

Hello Reader,
          This is part 13 of a many planned part series. I don't want to commit myself to listing all the planned parts as I feel that curses me to never finish. We've come a long way since the beginning of the series and in this part we solve one of the most persistent issues most of us have, getting easy access to volume shadow copies.

Now before we continue a reminder, don't start on this post! We've come a long way to get to this point and you should start at part 1 if you haven't already!

Part 1 - Accessing an image and printing the partition table
Part 2 - Extracting a file from an image
Part 3  - Extracting a file from a live system
Part 4 - Turning a python script into a windows executable
Part 5 - Auto escalating your python script to administrator
Part 6 - Accessing an E01 image and extracting files
Part 7 - Taking in command line options with argparse to specify an image
Part 8 - Hashing a file stored in a forensic image
Part 9 - Recursively hashing all the files in an image
Part 10 - Recursively searching for files and extracting them from an image
Part 11 - Recursively searching for files and extracting them from a live system
Part 12 - Recursively through multiple file system types

What you will need for this part:

You will need to download and install the pyvshadow MSI found here for Windows. This is the Python library that provides us with volume shadow access. It binds to the libvshadow library:

The vss helper library I found on the plaso project website:

You will need to download the following sample image that has volume shadows on it:!LlRFjbZJ!s0k263ZqKSw_TBz_xOl1m11cs2RhIIDPoZUFt5FuBgc Note this is a 4GB rar file that when uncompressed will become a 25GB raw image

The Concept:

Volume shadow copes are different than any other file system we've deal with this far. The volume shadow subsystem wraps around a NTFS partition (I'm not sure how it will be implemented in REFS) and stores within itself a differential cluster based backup of changes that occur on the disk. This means that all views our tools give us of the data contained with the volume shadow copies are emulated file systems views based on the database of differential cluster map stored within the volume shadow system. This is what the libvshadow library does for us, it parses the differential database end creates a view of the data contained within it as a file system that can be parsed. 

As with all things technical there is a catch. The base libvshadow was made to work with raw images only. There is of course a way around this, already coded into dfvfs, that I am testing for another part so we can access shadow copies from other types of forensic images.

As per Joachim Metz:
1. VSC is a large subsystem in Windows that can even store file copies on servers2. VSS (volsnap on-disk storage) is not a file system but a volume system it lies below NTFS. I opt to read:
Which means I am grossly oversimplifying things. In short I am describing how a volume system that exists within a NTFS volume is being interpreted as a file system for you by our forensic tools. Do not confuse VSS for actual complete volumes, they are differential cluster based records.

The Code:

The first thing we have to do is import the new library you just downloaded and installed as well as the helper class found on the Github.

import vss
import pyvshadow

Next since our code is NTFS specific we need to extend our last multi filesystem script to now to do something special if it detects that its accessing a NTFS partition:

print "File System Type Detected .",,"."
  if (str( == "TSK_FS_TYPE_NTFS_DETECT"):
    print "NTFS DETECTED"
We do this as seen above by comparing the string "TSK_FS_TYPE_NTFS_DETECT" to the converted enumerated value contained in If the values match then we are dealing with a NTFS partition that we can test to see if it has volume shadow copies.

To do the test are going to start using our new libraries:

    volume = pyvshadow.volume()
    fh = vss.VShadowVolume(args.imagefile, offset)
    count = vss.GetVssStoreCount(args.imagefile, offset)

First we create a pyshadow object by calling the volume function constructor and storing the object in the variable named volume. Next we are declaring the offset again to the beginning of our detected NTFS partition. Next we are going to use our vss helper library for two different purposes. The first is to get a volume object that we can work with in getting access to the volume shadows stored within. When we call the VShadowVolume function in our helper vss class we pass it two arguments. The first is the name of the raw image we passed in to the program and the offset to the beginning of the NTFS partition stored within the image. 

The second vss helper function we call is GetVssStoreCount which takes the same two arguments but does something very different. The function as the name implies returns the number of shadow copies that are present on the NTFS partition. The returned value starts at a count of 1 but the actual index of shadow copies starts at 0. Meaning whatever value is returned here we will have to treat as count -1 for our code. The other thing to know, based on my testing, you may get more volume shadows returned than are available on the native system. This is because when libvshadow parses the database it shows all available instances, including those volume shadow copies that have been deleted by the user or system but still partially exist within the database.

Not all NTFS partitions contain volume shadow copies so we need to test the count variable to see if it contains a result:

    if (count):

If it does contain a result (the function will set the count variable to undefined if there are no volume shadows on the partition) then we need to do two things first. The first is to keep track of which volume shadow we are working with which always begins with 0. Next we need to take our pyvshadow volume object and have it access the volume object named fh into the volume we created in the code above.

Once we have the pyvshadow volume object working with our vss helper class fh volume object we can start getting to the good stuff:

      while (vstore < count):
        store = volume.get_store(vstore)
        img = vss.VShadowImgInfo(store)

We are using a while loop here to iterate through all of the available shadow copies using the n -1 logic we discussed before (meaning that there is starting from 0 count -1 copies to go through). Next we are going to get the pyvshadow volume object to return a view of the volume shadow copy we are working with (whatever value vstore is currently set to) from the volume_get_store function and store it in the 'store' variable. 

We will then take the object stored in the 'store' variable and call the vss helper class function VShadowImgInfo which will return a imgInfo object that we can pass into pytsk and will work with our existing code:

        vssfilesystemObject = pytsk3.FS_Info(img)
        vssdirectoryObject = vssfilesystemObject.open_dir(path=dirPath)
        print "Directory:","vss",str(vstore),dirPath
        vstore = vstore + 1

So we are now working with our volume shadow copy as we would any other pytsk object. We are changing what we print now in the directory line to include which shadow copy we are working with. Next we are changing the parentPath variable we pass into the directoryRecurse function from [] or an empty list we used in our prior examples to a list which includes two members. The first member is the string 'vss' and the second is the string version of which shadow copy we are currently about to iterate and search. This is important so that we can uniquely export out each file that matches the search expression without overwriting the last file exported in prior shadow copy. 

Lastly we need to increment the vstore value for the next round of the while loop.

Only one more thing to do. We need to search the actual file system on this NTFS partition and define an else condition to continue to handle all the other file systems that our libtsk version supports:

      #Capture the live volume
      directoryObject = filesystemObject.open_dir(path=dirPath)
      print "Directory:",dirPath
      directoryObject = filesystemObject.open_dir(path=dirPath)
      print "Directory:",dirPath

We don't have to change any other code! That's it! We now have a program that will search, hash and export like before but now has volume shadow access without any other programs, procedures or drivers.

Running the program against the sample image looks like this:
E:\development>python -i image.dd -t raw -o rtf.cev -e  -s .*r
Search Term Provided .*rtf
Raw Type
0 Primary Table (#0) 0s(0) 1
Partition has no supported file system
1 Unallocated 0s(0) 2048
Partition has no supported file system
2 NTFS (0x07) 2048s(1048576) 204800
File System Type Dectected . TSK_FS_TYPE_NTFS_DETECT .
WARNING:root:Error while trying to read VSS information: pyvshadow_volume_open_f
ile_object: unable to open volume. libvshadow_io_handle_read_volume_header: inva
lid volume identifier.
3 NTFS (0x07) 206848s(105906176) 52219904
File System Type Dectected . TSK_FS_TYPE_NTFS_DETECT .
Directory: vss 0 /
Directory: vss 1 /
Directory: vss 2 /
Directory: /
4 Unallocated 52426752s(26842497024) 2048
Partition has no supported file system

The resulting exported data directory structure looks like this: 
You can access the Github of all the code here:

and the code needed for this post here:

Next part lets try to do this on a live system!

Sunday, May 10, 2015

Automating DFIR - How to series on programming libtsk with python Part 12

Hello Reader,
      How has a month passed since this last entry? To those reading this I do intend to continue this series for the foreseeable future so don't give up! In this part we will talk about accessing non NTFS partitions and the file system support available to you in pytsk. 

Now before we continue a reminder, don't start on this post! We've come a long way to get to this point and you should start at part 1 if you haven't already!

Part 1 - Accessing an image and printing the partition table
Part 2 - Extracting a file from an image
Part 3  - Extracting a file from a live system
Part 4 - Turning a python script into a windows executable
Part 5 - Auto escalating your python script to administrator
Part 6 - Accessing an E01 image and extracting files
Part 7 - Taking in command line options with argparse to specify an image
Part 8 - Hashing a file stored in a forensic image
Part 9 - Recursively hashing all the files in an image
Part 10 - Recursively searching for files and extracting them from an image
Part 11 - Recursively searching for files and extracting them from a live system 

Following this post the series continues

Part 13 - Accessing Volume Shadow Copies 

In this series so far we've focused on NTFS because that's where most of us spend our time investigating. However, the world is not Windows alone and luckily for us the sleuthkit libraries that pytsk binds to is way ahead of us. Here is the full list of file systems that the sleuthkit library supports:

  • NTFS 
  • FAT12 
  • FAT16 
  • FAT32 
  • exFAT
  • UFS1 (FreeBSD, OpenBSD, BSDI ...)
  • UFS1b (Solaris - has no type)
  • UFS2 - FreeBSD, NetBSD.
  • Ext2 
  • Ext3 
  • SWAP 
  • RAW 
  • ISO9660 
  • HFS 
  • Ext4 
  • YAFFS2 

                                Now that is what the current version of the sleuthkit supports, pytsk3 however is compiled against an older version. I've tested the following file systems to have worked with pytsk3
                                • NTFS
                                • FAT12
                                • FAT16
                                • FAT32
                                • EXT2
                                • EXT3
                                • EXT4
                                • HFS
                                Based on my testing I know that ExFAT does not appear to be supported in this binding and the testing of reader Hans-Peter Merkel I know that YAFFS2 is also not currently supported. I'm sure in the future when the binding is updated these file systems will then come into scope. 

                                So now that we know what we can expect to work let's change our code from part 10 to work against any supported file system, and allow it to work with multiple image types to boot. 

                                What you will need:
                                I'm using a couple of sample images from the CFReDS project for this part as they have a different partition for different variants of the same file system type. For instance the ext sample image we will use has ext2, ext3, and ext4 partitions all on one small image. Pretty handy! 

                                The first thing we will need to change is how we are opening our images to support multiple image types. We are going to be working with raw and e01 images all the time and keeping two separate programs to work with each seems dumb. Let's change our code to allow us to specify which kind of image we are working with and in the future we may automate that as well!

                                We need to add a new required command line option where we specify the type of image we are going to be working with:

                                argparser.add_argument(        '-t', '--type',
                                        help='Specify image type e01 or raw'

                                We are defining a new flag (-t or --type) to pass in the type of image we are dealing with. We are then storing our input into the variable imagetype and making this now required. 

                                Now we need to test our input and call the proper Image Info class to deal with it. First let's deal with the e01 format. We are going to move all the pyewf specific code into this if block:
                                if (args.imagetype == "e01"):  filenames = pyewf.glob(args.imagefile)  ewf_handle = pyewf.handle()

                                Next we are going to define the code to work with raw images:

                                elif (args.imagetype == "raw"):    print "Raw Type"
                                    imagehandle = pytsk3.Img_Info(url=args.imagefile)

                                One last big change to make and all the rest of our code will work:
                                for partition in partitionTable:  print partition.addr, partition.desc, "%ss(%s)" % (partition.start, partition.start * 512), partition.len  try:        filesystemObject = pytsk3.FS_Info(imagehandle, offset=(partition.start*512))  except:          print "Partition has no supported file system"          continue  print "File System Type Dectected ",

                                We are moving our FS_Info call to open the file system into a try/except block so that if the partition type is not supported our program won't exit on error. Why do we have to test each partition? Because 1. We can't trust the partition description to always tell us what file system is in it, Windows 8 for instance changed them, and the only method tsk makes available to us to determine the file system is within the FS_Info Class. So we will see if pytsk supports opening any partition we find and if it does not we will print an error to the user. It we do support the file system type then we will print the type detected and the rest of our code will work with no changes needed!

                                Le's see what this looks like on each of the example images I linked at the beginning of the post.

                                E:\development>python -i dfr-01-fat.dd -t rawRaw Type0 Primary Table (#0) 0s(0) 1Partition has no supported file system1 Unallocated 0s(0) 128Partition has no supported file system2 DOS FAT12 (0x01) 128s(65536) 16384File System Type Dectected  TSK_FS_TYPE_FAT16Directory: /3 DOS FAT16 (0x06) 16512s(8454144) 65536File System Type Dectected  TSK_FS_TYPE_FAT16Directory: /4 Win95 FAT32 (0x0b) 82048s(42008576) 131072File System Type Dectected  TSK_FS_TYPE_FAT32Directory: /5 Unallocated 213120s(109117440) 1884033Partition has no supported file system

                                E:\development>python -i dfr-01-ext.dd -t raw
                                Raw Type
                                0 Primary Table (#0) 0s(0) 1
                                Partition has no supported file system
                                1 Unallocated 0s(0) 61
                                Partition has no supported file system
                                2 Linux (0x83) 61s(31232) 651175
                                File System Type Dectected  TSK_FS_TYPE_EXT2
                                Directory: /
                                Cannot retrieve type of Bellatrix.txt
                                3 Linux (0x83) 651236s(333432832) 651236
                                File System Type Dectected  TSK_FS_TYPE_EXT3
                                Directory: /
                                4 Linux (0x83) 1302472s(666865664) 651236
                                File System Type Dectected  TSK_FS_TYPE_EXT4
                                Directory: /
                                5 Unallocated 1953708s(1000298496) 143445
                                Partition has no supported file system

                                E:\development>python -i dfr-01-osx.dd -t raw
                                Raw Type
                                0 Safety Table 0s(0) 1
                                Partition has no supported file system
                                1 Unallocated 0s(0) 40
                                Partition has no supported file system
                                2 GPT Header 1s(512) 1
                                Partition has no supported file system
                                3 Partition Table 2s(1024) 32
                                Partition has no supported file system
                                4 osx 40s(20480) 524360
                                File System Type Dectected  TSK_FS_TYPE_HFS_DETECT
                                Directory: /
                                5 osxj 524400s(268492800) 524288
                                File System Type Dectected  TSK_FS_TYPE_HFS_DETECT
                                Directory: /
                                6 osxcj 1048688s(536928256) 524288
                                File System Type Dectected  TSK_FS_TYPE_HFS_DETECT
                                Directory: /
                                7 osxc 1572976s(805363712) 524144
                                File System Type Dectected  TSK_FS_TYPE_HFS_DETECT
                                Directory: /
                                8 Unallocated 2097120s(1073725440) 34
                                Partition has no supported file system

                                There you go! We can now search, hash and extract from most things that come our way. In the next post we are going back to Windows and dealing with Volume Shadow Copies!

                                Follow the github repo here:

                                Sunday, April 26, 2015

                                National CCDC 2015 Red Team Debrief

                                Hello Reader,
                                              Here is this years Red Team debrief. If you have questions please leave them below.


                                Friday, April 3, 2015

                                Forensic Lunch 4/3/15 - Devon Kerr - WMI and DFIR and Automating DFIR

                                Hello Reader,

                                We had another great Forensic Lunch!

                                Guests this week:
                                Devon Kerr talking about his work at Mandiant/Fireeye and his research into WMI for both IR and attacker usage.

                                Matthew and I going into the Automating DFIR series and our upcoming talk at CEIC
                                You can watch the show on Youtube:

                                or below!

                                Sunday, March 22, 2015

                                Automating DFIR - How to series on programming libtsk with python Part 11

                                Hello Reader,
                                      I had a bit of a break thanks to a long overdue vacation but I'm back and the code I'll be talking about today has been up on the github repository for almost 3 weeks, so if you ever want to get ahead go there as I write the code before I try to explain it! Github repository is here:

                                Now before we continue a reminder, don't start on this post! We've come a long way to get to this point and you should start at part 1 if you haven't already!

                                Part 1 - Accessing an image and printing the partition table
                                Part 2 - Extracting a file from an image
                                Part 3  - Extracting a file from a live system
                                Part 4 - Turning a python script into a windows executable
                                Part 5 - Auto escalating your python script to administrator
                                Part 6 - Accessing an E01 image and extracting files
                                Part 7 - Taking in command line options with argparse to specify an image
                                Part 8 - Hashing a file stored in a forensic image
                                Part 9 - Recursively hashing all the files in an image
                                Part 10 - Recursively searching for files and extracting them from an image

                                Following this post the series continues:

                                Part 12 - Accessing different file systems
                                Part 13 - Accessing Volume Shadow Copies  

                                In this post we are going to augment the script on part 10 which went through an image and search/extracted files from all the NTFS partition in an image, and now we are going to do the same against all the NTFS partitions on a live system. You can obviously tweak this for any other file system but we will get to that in later posts in this series.

                                The first thing we need is a way to figure out what partitions exist on a live system in cross platform way so our future code can be tweaked to run anywhere. For this I choose the python library psutil which can provide a wealth of information about a system its running on, included information about available disks and partitions, you can read all about it here:

                                To bring it into our program we need to call the import function again:

                                import psutil

                                and then because we are going to work against a live running system again we need our old buddy admin

                                import admin

                                which if you remember from part 5 will auto escalate our script to administrator just in case we forgot to run it as such.

                                We are going to strip out the functions we need to find all the parts of a forensic image and replace it with out code to test for administrative access:

                                if not admin.isUserAdmin():
                                Next we replace the functions we called to get a partition table from a forensic image with a call to psutil to return a listing of paritions and iterate through them. The code looks like the following which I will explain:

                                partitionList = psutil.disk_partitions()
                                for partition in partitionList:
                                  imagehandle = pytsk3.Img_Info('\\\\.\\'+partition.device.strip("\\"))
                                  if 'NTFS' in partition.fstype:

                                So here instead of calling pytsk for a partition table we are calling psutil.disk_partitions which will return a list of partitions that are available to the local system. I much prefer this method than trying to iterate through all volume letters as we will get back just those partitions available as well as what file system they are seen as running as. Our list of active partitions will be stored in the varaible partitionList. Next we will iterate through the partitions using the for operator storing each partition returned into the partition variable. Next we are creating a pytsk3 Img_Info object for each partition returned but only continuing if psutil recognized the partition is NTFS.

                                The next thing we are changing is our try catch blog in our recursive directory function. Why? I found in my testing that live systems react much differently than forensic images in setting certain values in libtsk. So rather than using to determine if I'm dealing with a regular file I am using which seem to always be set regardless if its a live system or a forensic image. I'm testing to see if I can capture the type of the file and it's size here as there are a lot of interesting special files that only appear at run time that will throw an error if you try to get their size.

                                        f_type =
                                        size =
                                      except Exception as error:
                                          print "Cannot retrieve type or size of",
                                          print error.message

                                So in the above code I'm getting the type of file (lnk, regular, etc..) and it's size and if I can't I'm handling the error and printing out the error before continuing on. You will see errors, live systems are an interesting place to do forensics.

                                I am now going to make a change I alluded to earlier on in the series. We are going to buffer out reads and writes so we don't crash our of our program because we are trying to read a massive file into memory. This wasn't a problem in our first examples as we were working from small test images I made before, but now they we are dealing with real systems and real data we need to handle our data with care.

                                Our code looks as follows:

                                            BUFF_SIZE = 1024 * 1024
                                            md5hash = hashlib.md5()
                                            sha1hash = hashlib.sha1()
                                            if args.extract == True:
                                                  if not os.path.exists(outputPath):
                                                  extractFile = open(,'w')
                                            while offset <
                                                available_to_read = min(BUFF_SIZE, - offset)
                                                filedata = entryObject.read_random(offset,available_to_read)
                                                offset += len(filedata)
                                                if args.extract == True:

                                            if args.extract == True:

                                First we need to determine how much data we want to read or write at one time from a file. I've copied several other examples I've found and I'm setting that amount to 1 meg of data at a time by setting the variable BUFF_SIZE equal to 1024*1024 or one megabyte. Next we need to keep track of where we are in the file we are dealing with, we do that by creating a new variable called offset and setting the offset to 0 to start with.

                                You'll notice that we are creating our hash objects, directories and file handles before we read in any data. That is because we want to do all of these things one time prior to iterating through the contents of a file. If a file is a gigabyte in size then our function will be called 1,024 times and we just want one hash and one output file to be created.

                                Next we starting a while loop which will continue to execute until our offset is greater or equal to the size of our file, meaning we've read all the data within it. Now files are not guaranteed to be allocated in 1 meg chunks, so to deal with that we are going to take advantage of a python function called min. Min returns the smaller of to values presented which in our code is the size of the buffer compared to the remaining data left to read (the size of the file - our current offset). Whichever value is smaller will be stored in the variable available_to_read.

                                After we know how much data we want to read in this execution of our while loop we are going to read it as before from our entryObject passing in the offset to start from and how much data to read, storing the data read into the variable filedata. We are then calling the update function provided by our hashing objects. One of the nice things our the hashlibs provided by python is that if you provide additional data to an already instantiated object it will just continue to build the hash rather than having to read it all in at once.

                                Next we are incrementing our offset by adding to itself the length of data we just read so we will skip past it on the next while loop execution. Finally we write the data out to our output file if we elected to extract the files we are searching for.

                                I've added one last bit of code to help me catch any other weirdness that may seep through.

                                          print "This went wrong",,f_type

                                An else to look for any condition that does not match one of existing if statements.

                                That's it! You now have a super DFIR Wizard program that will go through all the active NTFS partitions on a running system and pull out and hash whatever files you want!

                                You can find the complete code here:

                                In the next post we will talk about parsing partitions types other than NTFS and then go into volume shadow copy access!

                                Friday, March 20, 2015

                                Forensic Lunch 3/20/15 - James Carder and Eric Zimmerman

                                Hello Reader!,
                                           We had another great Forensic Lunch! This broadcast we had:

                                James Carder of the Mayo Clinic, @carderjames, talking all about automating your response process to separate the random attacks from sophisticated attacks. You can hear James talk about this and much more at the SANS DFIR Summit where he'll be a panelist! If you want to work with James Mayo Clinic is hiring.

                                Mayo Clinic Infosec and IR Jobs:
                                Contact James Carder:

                                Special Agent Eric Zimmerman of the FBI, @EricRZimmerman , talking about his upcoming in depth Shellbags talk at the SANS DFIR Summit as well as his new tool called Registry Explorer. RE and Eric's research into windows registries will be continued in the next broadcast. Whether you are interested in registries from a research, academic or investigative perspective this is a must see, and FREE, tool!

                                Eric's Blog:
                                Eric's Github:
                                Registry Explorer:

                                You can watch the broadcast here on Youtube:

                                Or in the embedded player below:

                                Tuesday, March 3, 2015

                                Automating DFIR - How to series on programming libtsk with python Part 10

                                Hello Reader,
                                If you just found this series I have good news! There is way more of it to read and you should start at Part 1. See the links to all the parts below:

                                Part 1 - Accessing an image and printing the partition table
                                Part 2 - Extracting a file from an image
                                Part 3  - Extracting a file from a live system
                                Part 4 - Turning a python script into a windows executable
                                Part 5 - Auto escalating your python script to administrator
                                Part 6 - Accessing an E01 image and extracting files
                                Part 7 - Taking in command line options with argparse to specify an image
                                Part 8 - Hashing a file stored in a forensic image
                                Part 9 - Recursively hashing all the files in an image

                                Following this post the series continues:

                                Part 11 - Recursively searching for files and extracting them from a live system 
                                Part 12 - Accessing different file systems  
                                Part 13 - Accessing Volume Shadow Copies 

                                For those of you who are up to date let's get going! In this part of the series we are going to take our recursive hashing script and make it even more useful. We are going to allow the user to search for the kind of files they want to hash with a regular expression enabled search and give them the option to extract those files as they are found back to their original path under your output directory. Ready? Let's go!

                                First we will need to import two more libraries, so many libraries!. The good news is these are still standard python system libraries, so there is nothing new to install. The first library we will bring is called 'os' which gives us os related functions and will map the proper function based on the os you are running on. The second library called 're' will provide us with regular expression support when evaluating our search criteria. We bring in those libraries as before with an import command:

                                import os
                                import re

                                Now we need to add two more command line arguments to let our user take advantage of the new code we are about to write:

                                '-s', '--search',
                                help='Specify search parameter e.g. *.lnk'
                                '-e', '--extract',
                                help='Pass this option to extract files found'

                                Our first new argument is letting the user provide a search parameter. We are storing the search parameter in the variable and if the user does not provide one we default to .* which will match anything.

                                The second argument is using a different variable then the rest of our options. We are setting the variable args.extract as a True or False value with the store_true option under action. If the user provides this argument then the variable will be true and the files matched will be extracted, if the user does not then the value will be false and the program will only hash the files it finds.

                                It's always good to show the user that the argument we received from them is doing something so let's add two lines to see if we have a search term and print it:

                                if not == '.*':
                                  print "Search Term Provided",

                                 Remember that .* is our default search term, so if the value stored in is anything other than .* our program will print out the search term provided, otherwise it will just move on.

                                Our next changes all happen within our directoryRecurse function. First we need to capture the full path where the file we are looking at exists. We will do this by combining the partition number and the full path that lead to this file in order to make sure its unique between partitions.

                                outputPath ='./%s/%s/' % (str(partition.addr),'/'.join(parentPath))
                                Next we will go into the else if statement we wrote in the prior post to handle regular files with non zero lengths. We will add a new line of code to the beginning of the code block that gets executed to do our regular expression search as follows:

                                elif f_type == pytsk3.TSK_FS_META_TYPE_REG and != 0: searchResult = re.match(,
                                You can see we are use the re library here and calling the match function it provides. We are providing two arguments to the match function. The first is the search term the user provided us and the second is the file name of the regular, non zero sized file we are inspecting. If the regular expression provided by the user is a match then a match object will be returned and stored in the searchResult variable, if there is not a match then the variable will contain no data. We write a conditional to test this result next:

                                if not searchResult:
                                This allows us to skip any file that did not match the search result provided. If the user did not specify a search term our default value of .* will kick in and everything will be a match.

                                Our last modification revolves around extracting out the files if our user selected the option to. The code looks like this

                                if args.extract == True:
                                if not os.path.exists(outputPath):
                                extractFile = open(,'w')

                                The first thing we are doing is checking to see if the user has set the extract variable and caused it to be set to True. If they do then we will extract the file, if not we skip it. If it is true the first thing we do is make use of the os library's path.exists function. This will allow us to look at the output directory we want to create (and set in the outputPath variable above) already exists. If it does we can move on, if it does not than we call another os library provided function named makedirs. makedirs is nice because it will recursively create the path you specify so you don't have to loop through all the directories in between if they don't exist.

                                Now that our output path exists its time to extract it. We are modifying our old extractFile variable and now we are appending on our outputPath to the filename we want to create. This will place the file we are extracting in to the directory we have created. Next we write the data out to it as before and then close the handle since we will be reusing it.

                                If I was to run this program against the Level5 image we've been working with and specify both the extraction flag and provide a search term of .*jpg it would look like this

                                C:\Users\dave\Desktop>python -i SSFCC-Level5.E01 -e -s .*jpg
                                Search Term Provided .*jpg
                                0 Primary Table (#0) 0s(0) 1
                                1 Unallocated 0s(0) 8064
                                2 NTFS (0x07) 8064s(4128768) 61759616
                                Directory: /
                                Directory: /$Extend/$RmMetadata/$Txf
                                Directory: /$Extend/$RmMetadata/$TxfLog
                                Directory: /$Extend/$RmMetadata
                                Directory: //$Extend
                                match  BeardsBeardsBeards.jpg
                                match  ILoveBeards.jpg
                                match  ItsRob.jpg
                                match  NiceShirtRob.jpg
                                match  OhRob.jpg
                                match  OnlyBeardsForMe.jpg
                                match  RobGoneWild.jpg
                                match  RobInRed.jpg
                                match  RobRepresenting.jpg
                                match  RobToGo.jpg
                                match  WhatchaWantRob.jpg
                                Directory: //$OrphanFiles
                                On my desktop there would be a folder named 2 and underneath that the full path to the file that matched the search term would exist.

                                That's it! Part 9 had a lot going on but now that we've built the base for our recursion it gets easier from here. The code for this part is located on the series Github here:

                                Next in part 11 we will do the same thing from live systems but allow our code to enumerate all the physical disks present rather than hardcoding an option.