Daily Blog #33: Web 2.0 Forensics Part 1

Web 2.0 Forensics Part 1 by David Cowen - Hacking Exposed Computer Forensics Blog

Hello Reader,
                 I've finished two series, I've never even finished one in the last 5 years so I think this daily blog experiment is working. Thanks to all of you that are following along, I know it can be hard to keep up daily and for those that do (I compulsively watch pageviews) it does help me to keep going on with the dailies.

Today we begin a new series on 'web 2.0' forensics. I don't mean to use buzzwords but 'web 2.0' has come to represent a combination of technologies that have changed how custodians/suspects are accessing data from web services and how the systems we analyze are storing it. It is the aspect of retrieval of these asynchronous transactions that we will be talking about over the next blog posts. Based on the responses I got from the last Sunday Funday challenge I took it that many of you don't feel comfortable with these artifacts and how they get created so let's get into it so you can start getting more evidence!

The key technology that allowed web pages to update sections of their page without refreshing the contents of the entire page being viewed is AJAX. AJAX or Asynchronous Javascript and XML first introduced in 2006 standardized a mechanism allowing javascript executed within the browser to make a request to the webserver, receive the request, parse it and update the content on the page all seamless to the user. It is this technology that allowed many webmail systems to present a more fluid experience to theirs users and totally ruined the day of many a forensic examiner.

Before AJAX it was easy to write a carver to recognize the javascript in cached pages found all over the unallocated space of the disk recovering scores of webmail views. I wrote my first such Enscript back in 2002 and it became one of my favorite ways of finding data exfiltration. After AJAX all of the webmail views are being delivered via updates to single page loads all of which where occuring in memory and not being committed to the disk, this was a sad day. Suddenly the evidence we were all relying on was thought to be gone and unreachable.

Then someone started looking at the network traffic and what was being viewed and found the data structure of the XML/JSON requests being sent back and forth (I don't know who founded this research and if you do please comment below). They found these fragments in memory and more importantly in the Pagefile and Hiberfil! Now we don't have the same length of time back as we did when we had glorious cached pages being written to disk but we can again recover webmail and no one can complain about that.

If you remember one of the challenge questions was where we can recover JSON fragments from Gmail. The pagefile and hiberfil (and active memory of course but I'm looking at past activity recovery) before Windows Vista used to be the only locations, but now with shadow copies there's more! If you've heard me talk I've mentioned that Shadow Volume Copies contain more data then most people expect. In fact they also contain hiberfile and pagefile for each backup! That means for a shadow copy enabled disk you have by default weekly snapshots of possible JSON recovery available. If you are not extracting and searching this data (remember hiberfil is compressed and will not be searchable unless extracted and decompressed or a tool specifically supports it in the volume shadow) you are missing evidence.

Before we go on I actually got a second answer to the contest, while he didn't win a prize (late submission) he did give a different answer that I wanted to highlight.

Seth Ludwig writes:

In response to your blog post:
For a Windows 7 system:
1. Describe the Gmail JSON format and how you would recover it
A typical gmail JSON capture might look like the following:
,["gn","gsi test502"]
,["ft","Send photos easily from Gmail with
Google\'s \u003ca href\u003d\"http://
Recovering the JSON data could be achieved using a variety of forensics tools, both commercial and opensource, to carve for the files with the embedded JSON. (Encase, IEF, Helix3, etc).

2. Describe where in the disk you would expect to find Gmail JSON fragments.
Sometimes you simply cannot find them. The reason that this data is sometimes written to disk is largely because of browser bugs or lack of proper support for the no-cache HTML meta tag. This data isn't supposed to be written to disk in the first place, but due to various bugs it sometimes is. When the files are cached, you will find them named "mail[somenumber]", and is mainly located in Temporary Internet Files or other caches of unidentified data. Often you will be able to find these files in unallocated space. Additionally, you will find other files in the same places named "mail[somenumber].htm". There's often some JSON as described above contained within them.
Other possible and more likely locations:
Memory dumps
Hiberfil.sys (remember to decompress)

3. Which services popular in forensic investigations utilize JSON
Facebook, Twitter, Gmail, Skype, Google Talk, Yahoo Messenger and many others.

4. Provide a carve signature for the header and footer of a Gmail JSON
It's 1AM. You win this round.

5. Describe what Gmail's JSON would reveal to you
Utilizing JSON files, one has the potential to retrieve the following information:
Server name
Account Quota
Message List (Thread)
Conversation Summary
Message Information/Index
Message Body
Message Attachments
GMail Data Packet header
Thread Summary
End of Thread List
GMail Version

That's enough for today, hoepfully I've gotten you thinking. In the next post on Tuesday we will go into JSON data structures and how services use/store the data and how you can recover it.

Stay tuned for tomorrow's saturday reading and more importantly this Sunday Funday where you can win a free ticket to PFIC!

Post a Comment