Friday, October 30, 2015

Presenting ElasticHandler - OSDFCon 2015 - Part 1 of 3

Hello Reader,
         It's been too long, there are times I miss the daily blogging where I made myself do this daily. Other times I'm just ready to go to sleep and happy to not have the obligation on me. But 5 months without sharing what we are doing is far to long.

Today let me introduce you to what I hope will be a new tool in your arsenal, we call it Elastic Handler. Why? Well it's well known that we are bad at naming things at G-C Partners. We are bad at making cool names involving animals or clever acronyms. To make up for this we work really hard to make useful things.  I think you'll be pleasantly surprised to see how much this could help your day to day DFIR life.

What you will need:

1. Python installed, hopefully if you followed the blog at all this year you have that now
2. Our code from Github:
3.  Elastic Search.

If you don't already have Elastic Search (ES) running or you don't want to dedicate the time to do so, then you and I share a lot in common! When I was first testing this with Matt I didn't want to use the server we had setup internally as that wouldn't be realistic for those just trying to do this for the first time. That's when I remembered this amazing website that has packaged up services like ES and turned them into pre-configured and deployable instances. This website is called BitNami.

So let's say after reading this post you realize you want to use this neat new tool we've put out (it's open source btw!). Then you can just run the ES installer from Bitnami found here: and available for Windows, OSX and Linux!

Once you run through their install wizard everything is configured and ready to go! It really is that simple and it just keeps taking me by surprise how easy it is each time I use one of their packages.

What does it do?

You have what you need to use our new new new thing, but what does it do? Elastic Handler allows you to take all the tools you currently run and make their output better by doing the following:

1. It allows you to define a mapping from the CSV/TSV/Etc.. output you are getting now into JSON to that ES can ingest it
2. It allows you to normalize all of the data you are feeding in so that your column names are suddenly the same allowing cross reporting searching and correlation.
3. It lets you harness the extreme speed of Elastic Search to do the heavy listing in your correlation
4. It lets you take a unified view of your data/report to automate that analysis and create spreadsheets/etc.. that you would have spent a day on previously

The idea here is to let computers to what can be automated so you can spend more time using what makes humans special. What I mean by that is your analyst brains and experience to spot patterns, trends and meaning from the output of the reports.

In the GitHub repository we put out there are several mappings provided for the tools we call out to the most like; Tzworks tools, Shellbag explorer, our link parser, Mandiant's Shimcache parser, etc.. But the cool thing about this framework is that to bring in another report all you have to do is generate two text files to define your mapping, there is no code involved.

How does it work?

To run ElasticHandler you have two choices.

Option 1 - One report at a time
" --host --index --config --report

Option 2 - Batching reports
Included in the github repository is a batchfile that just shows how to run this against multiple reports at one time. The batch file is called 'index_reports.bat' you just need to add a line for each report you want to bring in.

Let's go into detail on what each option does.

--host: This is the IP Address/Hostname of the ES server. If you installed it on your own system you can just reference or localhost

--index This is the name of the index in Elastic Search where all of this batch of reports data will be stored. If this index does not exist yet, thats ok! ES will just make it for you and store the data there as long as that index is specified.

Think of index's as case's in FTK or Encase. All of the reports related to one particular investigation should go into one 'index' or 'case'. So if you had reports for one person's image you could make an index called 'bob' and then if you later bring in a new case involving max you would then specify 'max' as the index or all your new reports.

Index's here are just how you are going to logically store and group together the reports into your ES index. In the end they are all stored on the ES server and you can always specify which one to insert data into or query data from.

--config This is where the magic happens.The config file defines how the columns in your report output will be mapped into Json data for Elastic Search to store. Elastic Searches one big requirement is that all data that will be ingested has to be in JSON format. The good news is that you all you have to do is create a text file that defines these columns and ES will take it without having to pre-create a table or file within it to properly store the data. ES will store just about anything, its up to you to let it know what its dealing with so you can search it properly.

This leads to the mapping file which the config file references that will tell ES how to treat each column. Is it text a date, etc... I'll write up more on how to create this config files in the next post.

-- report This is tool output you want to bring into ES for future correlation and searching. Remember that you have to have a mapping defined in the config file above for the tool before you can successfully turn that CSV/TSV/Etc.. into something ES can take

If you just want to test out ES Handler we have sample reports ready to parse and config files for them ready to go in the GitHub.

How does it make my DFIR life better?

Once you have data ingested you can view it through ES's web interface (not the greatest) or Kibana (a visualization tool and good for timeline filtering) . If you really want to take full advantage of what you've just done (normalize, index and store) you need some program logic to correlate all that data into something new!

I've written one such automagic correlation for mapping out what usb drives have been plugged in and whats on each of the volumes plugged in via lnk files and jump lists into a xlsx file. You can find the script in our GitHub repo under scripts and its called I've actually fully commented this code so you can follow along and read it but I'll also be writing a post explaining how to create your own.

I do plan to write more of these correlation examples but let me show what the output looks like based on the sample data we provide in the GitHub repo:

First tab the USB devices found in the image from TZWorks USP tool:

The other tabs then contain what we know is on each volume based on the volume serial number recorded within the lnk files and jump lists:

And all of this takes about 1 second to run and create an xslx file with freeze pane applied to the first two rows for scrolling, filters applied to all the columns for easy viewing and more!

You can make the report output fancier and if you wanted to do so I would recommend looking at Ryan Benson's code for writing out xlsx files using xlsxwriter found in HindSight: 
which is where I found the best example of doing it.

What's next?

Try out Elastic Handler, see if you like what it does and look to see what tools we are missing mappings for. You can ingest anything you can get a report output for and you can correlate anything you can find within it. If you want more background you can see our slides from OSDFCon here:

and watch the blog as I supplement this with two more posts regarding how to make your own mapping files and how to create your own correlation scripts.