Daily Blog #246: Sunday Funday 2/23/14 Winner!

indexing program handles unstructured data for tokenizing

Hello Reader,
          It appears by what I received that there is a need for further explanation of how Indexers deal with unallocated space. I've posted the winning answer from Darren Windham but I'm also going to reach out to Jon Stewart to see if he'd be willing to write up his answer. Things are not as simple as they are presented to you once you begin your searching! That said, here is this weeks winning answer.

The Challenge:

You have a 10gbs of unallocated space from a drive that you need to index and search. Explain how an indexing program handles unstructured data for tokenizing and what other technologies must be used to handle encoded data.

The Winning Answer:

Darren Windham
When parsing an unstructured set of data you must first locate tokens or some indicators that an artifact of interest may be present. In a forensic examination this could be a keyword expression, a url, an email address, a header for a file to recover, or other encoded data.  


Once these tokens have been located they can be indexed and examined in further detail to determine if they are in fact relevant to your investigation.  One tool to automate some of this is bulk_extractor by Simson Garfinkel.  Since it does not parse the file system it can parse multiple parts of the disk in parallel depending on the number of processor cores available and it also detects compressed/encoded data and searches it recursively.  


No matter the tool or the search engine used to parse the unstructured data you still have to have some idea on what you are looking for and the best methods to find the kind and type of artifacts you are looking for or expect to find.  In those cases you don't get the expected results a different tool or search pattern may be needed to put some order to the data.

Also Read: Daily Blog #245

Post a Comment