Thursday, June 15, 2017

Analyzing Documents

I've noticed over time that a lot of the write-ups that get posted online regarding malware or downloaders delivered via email attachments (i.e., spear phishing campaign) focus on what happens after the malicious payload is activated...the URL reached to, the malware downloaded, etc.  However, few seem to dig into the document itself, and there's a great deal can be gleaned from those documents, that can add to the threat intel picture.  If you're not looking at everything involved in the incident, if you're not (as Jesse Kornblum said) using all the parts of the buffalo, then you're very likely missing critical elements of the threat intel picture.

Here's an example from MS...much of the information in the post focuses on the embedded macro and the subsequent decrypted Cerber executable file, but there's nothing available regarding the document itself.

Keep in mind that different file formats (LNK, OLE, etc.) will contain different information.  And what I'm referring to here isn't about running through analysis steps that take a great deal of time; rather, what I'm going to show you are a few simple steps you can use to derive even more information from the attachment/wrapper documents themselves.

I took a look at a couple of documents (Doc 1, Doc 2) recently, and wanted to share my process and see if others might find it useful.  Both of these OLE-format documents have hashes available (or you can download and compute the hashes yourself), and they were also found on VirusTotal:

VirusTotal analysis for Doc 1
VirusTotal analysis for Doc 2

The VT analysis for both files includes a comment that the file was used to deliver PupyRAT.

Tools
Tools I'll be using for this analysis include my own oledmp.pl and wmd.pl.

Doc 1 Analysis
Running oledmp.pl against the file, we see:

Fig. 1: Doc 1 oledmp.pl output

















That's a lot of streams in this OLE file.  So, one of the first things we see is the dates for the Root Entry and the 'directories' (MS has referred to the OLE file format as "a file system within a file", and they're right), which is 1 Jan 2017.  According to VT, the first time this file was submitted was 1 Jan 2017, at approx. 20:29:43 UTC...so what that tells us is that it's likely that one of the first folks to receive the document submitted it less than 14 hrs after the file was modified.

Continuing with oledmp.pl, we can view the contents of the various streams in a hex dump format, but we see that stream number 20 contains a macro.  Using oledmp.pl with the argument "-d 20", we can view the contents of the stream in hex dump format.  In the output we see what appear to be 2 base64-encoded Powershell commands, one that downloads PupyRAT to the system, and another that appears to be shell code.  Copying and decoding both of the streams gives us the command that downloads PupyRAT, as well as a second command that appears to be some form of shell code.  Some of the variable names ($Qsc, $zw5) appear to be unique, so searching for those via Google leads us to this Hybrid-Analysis write-up, which provides some insight into what the shell code may do.

Interestingly enough, the same search reveals that, per this Reverse.IT link, both encoded Powershell commands were used in another document, as well.

Moving on, here's an excerpt of the output from wmd.pl, when run against this document:

--------------------
Summary Information
--------------------
Title        :
Subject      :
Authress     : Windows User
LastAuth     : Windows User
RevNum       : 2
AppName      : Microsoft Office Word
Created      : 01.01.2017, 06:51:00
Last Saved   : 01.01.2017, 06:51:00
Last Printed :

Notice the dates...they line up with the previously-identified dates (see fig.1)


Doc 2 Analysis
Following the same process we did with doc 1, we can see very similar output from oledmp.pl with doc 2:

Fig. 2: Doc 2 oledmp.pl output















One of the first things we can see is that this document was created within about 24 hrs of doc 1.

In the case of doc 2, stream 16 contains the data we're looking for...extracting and decoding the base64-encoded Powershell commands, we see that the commands themselves (PupyRAT download, shell code) are different.  Conducting a Google search for the variables used in the shell code command, we find this Hybrid-Analysis write-up, as well as this one from Reverse.IT.

Here's an excerpt of the output from wmd.pl, when run against this document:

--------------------
Summary Information
--------------------
Title        : HealthSecure User Registration Form
Subject      : HealthSecure User Registration Form
Authress     : ArcherR
LastAuth     : Windows User
RevNum       : 2
AppName      : Microsoft Office Word
Created      : 02.01.2017, 06:49:00
Last Saved   : 02.01.2017, 06:49:00
Last Printed : 20.06.2013, 06:27:00

--------------------
Document Summary Information
--------------------
Organization : ACC

Remember, this is a sample pulled down from VirusTotal, so there's no telling what happened with the document between the time it was created and submitted to VT.  I made the 'authress' information bold, in order to highlight it.

Summary
While this analysis may not appear to be of significant value, it does form the basis for developing a better intelligence picture, as it goes beyond the more obvious aspects of what constitutes most analysis (i.e., the command to download PupyRAT, as well as the analysis of the PupyRAT malware itself) in phishing cases.  Valuable information can be derived from the document format used to deliver the malware itself, regardless of whether it's an MSOffice document, or a Windows shortcut/LNK file.  When developing an intel picture, we need to be sure to use all the parts of the buffalo.

Wednesday, May 31, 2017

Use of LNK Files...Again

I've discussed LNK file before...here, and more recently, here.  Windows shortcut files are an artifact on Windows systems that have been available for so long that there are likely a great many analysts who understand the category this artifact falls into, but may not know a great deal about the nature of the artifact itself.  That is to say that it's likely that a good many training and certification courses tend to gloss over some of the more esoteric aspects of the Windows shortcut file as an artifact.

Most analysts likely understand the Windows shortcut, or "LNK" file to be an artifact of "user knowledge of files", as the most commonly understood activity that leads to the automatic creation of these files is a user double-clicking on a file via the Windows Explorer shell.

We also know that understanding the format of these files can be very important, as well.  The LNK file format is used as the basis for individual streams within JumpLists, and there are several Registry keys that contain value data that follows the LNK file format.  In fact, LNK files are compose, at least in part, of shell items...so now we're getting into the format of the individual building blocks of many artifacts on Windows systems.

Let's take a step back for a moment, and consider the most common means of "producing" Windows shortcut files.  From a standpoint of developing and interpreting evidence, let's say that a user plugs a USB device into a laptop, opens the contents of the device in Windows Explorer, and then double-clicks an MS Word document.  At that point, a shortcut file is created in the user's Recent folder that contains not only a series of shell items that comprise the path to the file, but also a string that refers to the file, as well.  However, the LNK file also contains information about that system on which it was created, and it's this aspect of the file format that is missed or forgotten, due the fact that in most cases, this embedded information isn't very interesting.  However, if an LNK file is not created as an artifact of, say, a malware infection, but is instead used by an attacker to infect other systems, then the LNK file will contain information about the system on which it was developed, which would NOT be the victim system.  This is when the contents of the LNK file become very interesting.

TrendMicro's TrendLabs recently published a blog article that gives a very good example of when these files can be extremely interesting.  The article describes the reported increase in use of LNK files as weaponized attachments by the group identifed as "APT 10".  I find this absolutely fascinating, not because something that has been part of Windows from the very beginning has been turned against users and employed as an attack vector, but because of the information and metadata that seems to be left on the floor because these files simply are not parsed or dug into...at least not to any extent that is discussed publicly.  It seems that most organizations may miss or discount the value of Windows shortcut file contents, and not incorporate them into their overall intelligence picture.

This JPCERT blog post describes "Evidence of Attackers' Development Environment Left in Shortcut Files", and this NViso blog post discusses, "Tracking Threat Actors Through .LNK Files".  These are great at describing what can be done, providing illustration and example.

Sunday, April 09, 2017

Getting Started

Not long ago, I gave some presentations at a local high school on cybersecurity, and one of the questions that was asked was, "how do I get started in cybersecurity?"  Given that my alma mater will establish a minor in cybersecurity this coming fall, I thought that it might be interesting to put some thoughts down, in hopes of generating a discussion on the topic.

So, some are likely going to say that in today's day and age, you can simply Google the answer to the question, because this topic has been discussed many times previously.  That's true, but it's as blessing as much as it is a curse; there are many instances in which multiple opinions are shared, and at the end of the thread, there's no real answer to the question.  As such, I'm going to share my thoughts and experience here, in hopes that it will start a discussion that others can refer to.  I'm hoping to provide some insight to anyone looking to "get in" to cybersecurity, whether you're an upcoming high school or college graduate, or someone looking to make a career transition.

During my career, I've had the opportunity to be a "gate keeper", if you will.  As an incident responder, I was asked to vet resumes that had been submitted in hopes of filling a position on our team.  To some degree, it was my job to receive and filter the resumes, passing what I saw as the most qualified candidates on to the next phase.  I've also worked with a pretty good number of analysts and consultants over the years.

The world of cybersecurity is pretty big and there are a lot of roads you can follow; there's pen testing, malware reverse engineering, DFIR, policy, etc.  There are both proactive and reactive types of work.  The key is to pick a place to start.  This doesn't mean that you can't do more than one...it simply means that you need to decide where you want to start...and then start. Pick some place, and go from there.  You may find that you're absolutely fascinated by what you're learning, or you may decide that where you started simply is not for you.  Okay, no problem.  Pick a new place and start over.

When it comes to reviewing resumes, I tend to not focus on certifications, nor the actual degree that someone has.  Don't get me wrong, there are a lot of great certifications out there.  The issue I have with certifications is that when most folks return from the course(s) to obtain the certification, there's nothing that holds them accountable for using what they learned.  I've seen analysts go off to a 5 or 6 day training course in DFIR of Windows systems, which cost $5K - $6K (just for the course), and not know how to determine time stomping via the MFT (they compared the file system last modification time to the compile time in the PE header).

I am, however, interested to see that someone does have a degree.  This is due to the fact that having a degree pretty much guarantees a minimum level of education, and it also gives insight into your ability to complete tasks.  A four (or even two) year degree is not going to be a party everyday, and you're likely going to end up having to do things you don't enjoy.

And why is this important?  Well, the (apparently) hidden secret of cybersecurity is that at some point, you're going to have to write.  That's right. No matter what level of proficiency you develop at something, it's pretty useless if you can't communicate and share it with others.  I'm not just talking about sharing your findings with your team mates and co-workers (hint, "LOL" doesn't count as "communication"), I'm also talking about sharing your work with clients.

Now, I have a good bit of experience with writing throughout my career.  I wrote in the military (performance reviews, reports, course materials, etc.), as part of my graduate education (to include my thesis), and I've been writing almost continually since I started in infosec.  So...you have to be able to write.  A great way to get experience writing is to...well...write.  Start a blog.  Write something up, and share it with someone you trust to actually read it with a critical eye, not just hand it back to you with a "looks good".  Accept that what you write is not going to be perfect, every time, and use that as a learning experience.

Writing helps me organize my thoughts...if I were to just start talking after I completed my analysis, what came out of my mouth would not be nearly as structured, nor as useful, as what I could produce in writing.  And writing does not have to be sole source of communications; I very often find it extremely valuable to write something down first, and then use that as a reference for a conversation, or better yet, a conference presentation.

So, my recommendations for getting started in the cybersecurity field are pretty simple:
1. Pick some place to start.  If you have to, reach to someone for advice/help.
2. Start. If you have to, reach to someone for advice/help.
3. Write about what you're doing. If you have to, reach to someone for advice/help.

There are plenty of free resources available that provide access to what you need to get started; online blog posts, pod casts/videos, presentations, books (yes, books online and in the library), etc.  There are free images available for download, as part of DFIR challenges (if that's what you're interested in doing).  There are places you can go to find out about malware, download samples, or even run samples in virtual environments and emulators.  In fact, if you're viewing this blog post online, then you very likely have everything you need to get started.  If you're interested in DFIR analysis or malware RE, you do not need to have access to big expensive commercial tools to conduct analysis...that's just an excuse for paralysis.

There is a significant reticence to sharing in this "community", and it's not simply isolated to folks who are new to the field.  There are a lot of folks who have worked in this industry for quite a while who will not share experiences or findings.  And there is no requirement to share something entirely new, that no one's seen before.  In fact, there's a good bit of value in sharing something that may have been discussed previously; it shows that you understand it (or are trying to), and it can offer visibility and insight to others ("oh, that thing that was happening five years ago is coming back...like bell bottoms...").

The take-away from all of this is that when you're ready to put your resume out there and apply for a position in cybersecurity, you're going to have some experience in the work, have visible experience writing that your potential employer can validate, and you're going to know people in the field.

Understanding What The Data Is Telling You

Not long ago, I was doing some analysis of a Windows 2012 system and ran across an interesting entry in the AppCompatCache data:

SYSVOL\Users\Admin\AppData\Roaming\badfile.exe  Sat Jun  1 11:34:21 2013 Z

Now, we all know that the time stamp associated with entries in the AppCompatCache is the file system last modification time, derived from the $STANDARD_INFORMATION attribute.  So, at this point, all I know about this file is that it existed on the system at some point, and given that it's now 2017 it is more than just a bit odd, albeit not impossible, that that is the correct file system modification date.

Next stop, the MFT...I parsed it and found the following:

71516      FILE Seq: 55847 Links: 1
[FILE],[BASE RECORD]
.\Users\Admin\AppData\Roaming\badfile.exe
    M: Sat Jun  1 11:34:21 2013 Z
    A: Mon Jan 13 20:12:31 2014 Z
    C: Thu Mar 30 11:40:09 2017 Z
    B: Mon Jan 13 20:12:31 2014 Z
  FN: msiexec.exe  Parent Ref: 860/48177
  Namespace: 3
    M: Thu Mar 30 11:40:09 2017 Z
    A: Thu Mar 30 11:40:09 2017 Z
    C: Thu Mar 30 11:40:09 2017 Z
    B: Thu Mar 30 11:40:09 2017 Z
[$DATA Attribute]
File Size = 1337856 bytes

So, this is what "time stomping" of a file looks like, and this also helps validate that the AppCompatCache time stamp is the file system last modification time, extracted from one of the MFT record attributes.  At this point, there's nothing to specifically indicate when the file was executed but now, we have a much better idea of when the file appeared on the system. The bad guy most likely used the GetFileTime() and SetFileTime() API calls to perform the time stomping, which we can see by going to the timeline:

Mon Jan 13 20:12:31 2014 Z
  FILE                       - .A.B [152] C:\Users\Admin\AppData\Roaming\
  FILE                       - .A.B [56] C:\Windows\explorer.exe\$TXF_DATA
  FILE                       - .A.B [1337856] C:\Users\Admin\AppData\Roaming\badfile.exe
  FILE                       - .A.B [2391280] C:\Windows\explorer.exe\

Fortunately, the system I was examining was Windows 2012, and as such, had a well-populated AmCache.hve file, from which I extracted the following:

File Reference: da2700001175c
LastWrite     : Thu Mar 30 11:40:09 2017 Z
Path          : C:\Users\Admin\AppData\Roaming\badfile.exe
Company Name  : Microsoft Corporation
Product Name  : Windows Installer - Unicode
File Descr    : Windows® installer
Lang Code     : 1033
SHA-1         : 0000b4c5e18f57b87f93ba601e3309ec01e60ccebee5f
Last Mod Time : Sat Jun  1 11:34:21 2013 Z
Last Mod Time2: Sat Jun  1 11:34:21 2013 Z
Create Time   : Mon Jan 13 20:12:31 2014 Z
Compile Time  : Thu Mar 30 09:28:13 2017 Z

From my timeline, as well as from previous experience, the LastWrite time for the key in the AmCache.hve corresponds to the first time that badfile.exe was executed on the system.

What's interesting is that the Compile Time value from the AmCache data is, in fact, the compile time extracted from the header of the PE file.  Yes, this value is easily modified, as it is simply a bunch of bytes in the file that do not affect the execution of the file itself, but it is telling in this case.

So, on the surface, while it may first appear as if the badfile.exe had been on the system for four years, it turns out that by digging a bit deeper into the data, we can see that wasn't the case at all.

The take-aways from this are:
1.  Do not rely on a single data point (AppCompatCache) to support your findings.

2.  Do not rely on the misinterpretation of a single data point as the foundation of your findings.  Doing so is more akin to forcing the data to fit your theory of what happened.

3. The key to analysis is to know the platform you're analyzing, know your data...no only what is available, but it's context.

4.  During analysis, always look to artifact clusters.  There will be times when you do not have access to all of the artifacts in the cluster, so you'll want to validate the reliability and fidelity of the artifacts that you do have.

Saturday, April 08, 2017

Understanding File and Data Formats

When I started down my path of studying techniques and methods for computer forensic analysis, I'll admit that I didn't start out using a hex editor...that was a bit daunting and more than a little overwhelming at the time.  Sure, I'd heard and read about those folks who did, and could, conduct a modicum of analysis using a hex editor, but at that point, I wasn't seeing "blondes, brunettes, and redheads...".  Over time and with a LOT of practice, however, I found that I could pick out certain data types within hex data.  For example, within a hex dump of data, over the years my eyes have started picking out repeating patterns of data, as well as specific data types, such as FILETIME objects.

Something that's come out of that is the understanding that knowing the structure or format of specific data types can provide valuable clues and even significant artifacts.  For example, understanding the structure of Event Log records (binary format used for Windows NT, 2000, XP, and 2003 Event Logs) has led to the ability to parse for records on a binary level and completely bypass limitations imposed by using the API.  The first time I did this, I found several valid records in a *.evt file that the API "said" shouldn't have been there.  From there, I have been able to carve unstructured blobs of data for such records.

Back when I was part of the IBM ISS ERS Team, an understanding of the structure of Windows Registry hive files led us to being able to determine the difference between credit card numbers being stored "in" Registry keys and values, and being found in hive file slack space.  The distinction was (and still is) extremely important.

Developing an understanding of data structures and file formats has led to findings such as Willi Ballenthin's EVTXtract, as well as the ability to parse Registry hive files for deleted keys and values, both of which have proven to be extremely valuable during a wide variety of investigations.

Other excellent examples of this include parsing OLE file formats from Decalage, James Habben's parsing Prefetch files, and Mari's parsing of data deleted from SQLite databases.

Other examples of what understanding data structures has led to includes parsing Windows shortcuts/LNK files that were sent to victims of phishing campaigns.  This NViso blog post discusses tracking threat actors through the .lnk file they sent their victims, and this JPCert blog post from 2016 discusses finding indications of an adversary's development environment through the same resource.

Now, I'm not suggesting that every analyst needs to be intimately familiar with file formats, and be able to parse them by hand using just a hex editor.  However, I am suggesting that analysts should at least become aware of what is available in various formats (or ask someone), and understand that many of the formats can provide a great deal of data that will assist you in your investigation.

Sunday, March 26, 2017

Links, Updates

LNK Attachments
Through my day job, we've seen a surge in spam campaigns lately where Windows shortcuts/LNK files were sent to the targets as email attachments.  A good bit of the focus has been the embedded commands within the LNK files, and how those commands have been obfuscated in order to avoid detection or analysis.  There is some great work being done in this area (discovery, analysis, etc.) but at the same time, a good bit of data and some potential intelligence is being left "on the floor", in that the LNK files themselves are not being parsed for embedded data.

DFIR analysts are probably most often familiar with LNK files being used to either maintain malware persistence when infected, or to indicate that a user opened a file on their system.  In instances such as these, the MS API for creating LNK files embeds information from the local system within the binary contents of the LNK file.  However, if a user is sent an LNK file, that file must have been created on another system all together...which means that unless a specific script was used to create the LNK file on a non-Windows system, or to modify the embedded information, we can assume that the embedded information (MAC address, system NetBIOS name, volume serial number, SID) was from the system on which the LNK file was created.

I'd blogged on this before (yes, eight years ago), and while researching that blog, had found this reference to %LnkGet% at Microsoft.

I recently ran across this fascinating write-up from NVISO Labs regarding the analysis of an LNK file shipped embedded within an old-style (re: OLE) MSWord document.  While the write-up contains hex excerpts of the LNK file, and focuses on the command used to launch bitsadmin.exe to download malware, what it does not do is extract embedded artifacts (SID, system NetBIOS name, MAC address, volume serial number) from the binary contents of the LNK file.

I know...so what?  Who cares?  How can you even use this information?  Well, if you're maintaining case notes in a collaboration portal, you can search for various findings across engagements, long after those engagements have closed out or analysts have left (retired, moved on, etc.), developing a "bigger picture" view of activity, as well as maintaining intelligence from those engagements.  For example, keeping case notes across engagements will allow a perhaps less experienced analyst see what's been done on previous engagements, and illustrate what further work can be done (just-in-time learning).  Of course then there's correlating multiple engagements with marked similarities (intel gathering).  Or, something to consider is that there are Windows Event Log records that include the NetBIOS name of the remote system when a login occurs, and you might be able to correlate that information with what's seen embedded in LNK files (intel collection/development).

MS-SHLLINK: Shell Link Binary File Format

AutoRun - ServiceStartup
Adam's found yet another autostart location within the Registry, this one the ServiceStartup key in the Windows Registry.  I haven't seen a system with this key populated, but it's definitely something to look out for, as this would make a great RegRipper plugin, or a great addition to the malware.pl plugin.

However, while I was looking around at a Software hive to see if I could find a ServiceStartup key with any values, I ran across the following key that looked interesting:

HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\WSMAN\AutoRestartList

Digging into the key a bit, I found a couple of interesting articles (here, and here).  This is something I plan to keep a close eye on, as it looks as if it could be very interesting.

Analysis
The guys from Carbon Black recently shared some really fascinating analysis regarding "a malicious Excel document was used to create a PowerShell script, which then used the Domain Name System (DNS) to communicate with an Internet Command and Control (C2) server."

This is really fascinating stuff, not only to see stuff you've never seen before, but to also see how such things can be discovered (how would I instrument or improve my environment to detect this?), as well as analyzed.

Imposter Syndrome
I was catching up on blog post reading recently, and came across a blog post that centered on how the imposter syndrome caused one DFIR guy to sabotage himself.  Not long after reading that, I read James' post about his experiences at BSidesSLC, and his offer to help other analysts...and it sounded to me like James was offering a means for DFIR folks who are interested to overcome their own imposter syndrome.

On a similar note, not too long ago, I was asked to give some presentations on cyber-security at a local high school, and during one of the sessions, one of the students had an excellent question...how does someone with no experience apply for an entry-level position that requires experience?  To James' point, I recommended that the students with an interest in cyber-security start working on projects and blogging about them, sharing what they did, and what they found.  There are plenty of resources available, including system images that can be downloaded and analyzed, malware that can be downloaded and analyzed, etc.  Okay, I know folks are going to say, "yeah, but others have already done that..."...really?  I've looked at a bunch of the images that are available for download, and one of the things I don't see is responses...yes, there are some, but not many.  So, download the image, conduct and document your analysis, and write it up.  So what if others have already done this?  The point is you're sharing your experiences, findings, how you conducted your analysis, and most importantly, you're showing others your ability to communicate through the written word.  This is very important, because pretty much every job in the adult world includes some form of written communications, either through email, reports, etc.

Here's what I would look for if I were looking to fill a DFIR analysis position...even if the image had already been analyzed by others, I'd look for completeness of analysis based on currently available information, and how well it was communicated.  I'd also look for things like, was any part of the analysis taken a step or two further?  Did the "analyst" attempt to do all of the work themselves, or was there collaboration?  I'd want to see (or hear, during the interview) justification for the analysis steps along the way, not to second guess, but in order to understand thought processes.  I'd also want to see if there was a reliance on commercial tools, or if the analyst was able to incorporate (or better yet, modify) open source tools.

Not all of these aspects are required, but these are things I'd look for.  These are also things I'd look at when bringing new analysts onto a team, and mentoring them.

PowerShell
posted recently about some interesting Powershell findings that pertain to Powershell version 5.  I found this while examining a Windows 10 system, but if someone has updated the Powershell installation on their Windows 7 and 8 systems, they should also be able to take advantage of the artifacts.

However, something popped up recently that simply reiterated the need to clearly identify and understand the version of Windows that you're examining...Powershell downgrade attacks.

This SecurityAffairs article provides an example of how Powershell has been used.

Wednesday, March 15, 2017

Incorporating AmCache data into Timeline Analysis

I've had an opportunity to examine some Windows 10 systems lately, and recently got a chance to examine a Windows 2012 server system.  While I was preparing to examine the Windows 2012 system, I extracted a number of files from the image, in order to incorporate the data in those files into a timeline for analysis.  I also grabbed the AmCache.hve file (I've blogged about this file previously...), and parsed it using the amcache.pl RegRipper plugin.  What I'm going to do in this post is walk through an example of something I found after the initial analysis,

From the ShimCache data from the system, I found the following reference:

SYSVOL\Users\Public\Downloads\badfile.exe  Fri Jan 13 11:16:40 2017 Z

Now, we all know that the time stamp associated with the entry in the ShimCache data is the file system last modification time of the file (NOT the execution time), and that if you create a timeline, this data would be best represented by an entry that includes "M..." to indicate the context of the information.

I then looked at the output of the amcache.pl plugin to see if there was an entry for this file, and I found the following:

File Reference    : 720000150e1
LastWrite           : Sun Jan 15 07:53:53 2017 Z
Path                    : C:\Users\Public\Downloads\badfile.exe
Company Name : FileManger
Product Name    : Fileppp
File Descr          : FileManger
Lang Code         : 0
SHA-1               : 00002861a7c280cfbb10af2d6a1167a5961cf41accea
Last Mod Time  : Fri Jan 13 11:16:40 2017 Z
Last Mod Time2: Fri Jan 13 11:16:40 2017 Z
Create Time       : Sun Jan 15 07:53:26 2017 Z
Compile Time    : Fri Jan 13 03:16:40 2017 Z

We know from Yogesh's research that the "File Reference" is the file reference number from the MFT; that is, the sequence number and the MFT record number.  In the above output, the "LastWrite" entry is the LastWrite time for the key with the name referenced in the "File Reference" entry.  You'll also notice some additional information that could be pretty useful...some of it (Lang Code, Product Name, File Descr) were values that I added to the plugin today (I also updated the plugin repository on GitHub, as well).

You'll also notice that there are a few time stamps, in addition to the key LastWrite time.  I thought that it would be interesting to see what effect those time stamps would have on a timeline; so, I wrote a new plugin (amcache_tln.pl, also uploaded to the repository today) that would allow me to add data to my timeline.  After adding the AmCache.hve time stamp data to my timeline, I went looking for

Sun Jan 15 07:53:53 2017 Z
  AmCache       - Key LastWrite   - 720000150e1:C:\Users\Public\Downloads\badfile.exe
  REG                 User - [Program Execution] UserAssist - C:\Users\Public\Downloads\badfile.exe (1)


Sun Jan 15 07:53:26 2017 Z
  AmCache       - ...B  720000150e1:C:\Users\Public\Downloads\badfile.exe
  FILE              - .A.B [286208] C:\Users\Public\Downloads\badfile.exe


Fri Jan 13 11:16:40 2017 Z
  FILE               - M... [286208] C:\Users\Public\Downloads\badfile.exe
  AmCache       - M...  720000150e1:C:\Users\Public\Downloads\badfile.exe

Fri Jan 13 03:16:40 2017 Z
  AmCache       - PE Compile time - 720000150e1:C:\Users\Public\Downloads\badfile.exe

Clearly, a great deal more analysis and testing needs to be performed, but this timeline excerpt illustrates some very interesting findings.  For example, the AmCache entries for the M and B dates line up with those from the MFT.

Something else that's very interesting is that the AmCache key LastWrite time appears to correlate to when the file was executed by the user.

For the sake of being complete, let's take the parsed MFT entry for the file:

86241      FILE Seq: 114  Links: 1
[FILE],[BASE RECORD]
.\Users\Public\Downloads\badfile.exe
    M: Fri Jan 13 11:16:40 2017 Z
    A: Sun Jan 15 07:53:26 2017 Z
    C: Fri Feb 10 11:37:25 2017 Z
    B: Sun Jan 15 07:53:26 2017 Z
  FN: badfile.exe  Parent Ref: 292/1
  Namespace: 3
    M: Sun Jan 15 07:53:26 2017 Z
    A: Sun Jan 15 07:53:26 2017 Z
    C: Sun Jan 15 07:53:26 2017 Z
    B: Sun Jan 15 07:53:26 2017 Z
[$DATA Attribute]
File Size = 286208 bytes

We know we have the right file...if we convert the MFT record number (86241) to hex, and prepend it with the sequence number (also converted to hex), we get the file reference number from the AmCache.hve file.  We also see that the creation date for the file is the same in both the $STANDARD_INFORMATION and $FILE_NAME attributes from the MFT record, and they're also the same as the value extracted from the AmCache.hve file.

There definitely needs to be more research and work done, but it appears that the AmCache data may be extremely valuable with respect to files that no longer exist on the system, particularly if (and I say "IF") the key LastWrite time corresponds to the first time that the file was executed.  Review of data extracted from a Windows 10 system illustrated similar findings, in that the key LastWrite time for a specific file reference number correlated to the same time that an "Application Popup/1000" event was recorded in the Application Event Log, indicating that the application had an issue; four seconds later, events (EVTX, file system) indicated an application crash.  I'd like to either work an engagement where process creation information is also available, or conduct testing and analysis of a Win2012 or Win10 system that has Sysmon installed, as it appears that this data may indicate/correlate to a program execution finding.

Now, clearly, the AmCache.hve file can contain a LOT of data, and you might not want it all.  You can minimize what's added to the timeline by using the reference to the "Public\Downloads" folder, for example, as a pivot point.  You can run the plugin and pipe the output through the find command to get just those entries that include files in the "Public\Downloads" folder in the following manner:

rip -r amcache.hve -p amcache_tln | find "public\downloads" /i

An alternative is to run the plugin, output all of the entries to a file, and then use the type command to search for specific entries:

rip -r amcache.hve -p amcache_tln > amcache.txt
type amcache.txt | find "public\downloads" /i

Either one of these two methods will allow you to minimize the data that's incorporated into a timeline and create overlays, or simply create micro-timelines solely from data within the AmCache.hve file.

Oh, and hey...more information on language ID codes can be found here and here.

Addendum: Additional Sources
So, I'm not the first one to mention the use of AmCache.hve entries to illustrate program execution...others have previously mentioned this artifact category:
Digital Forensics Survival Podcast episode #020
Willi's amcache.py parser

Saturday, February 25, 2017

Powershell Stuff, Windows 10

Source: InterWebs
I recently had some Windows 10 systems come across my desk, and as part of my timeline creation process, I extracted the Windows Event Log files of interest, and ran them through my regular sequence of commands.  While I was analyzing the system timeline, I ran across some interesting entries, specifically "Microsoft-Windows-Powershell/4104" events; these events appeared to contain long strings of encoded text.  Many of the events were clustered together, up to 20 at a time, and as I scrolled through these events, I saw strings (not encoded) that made me think that I was looking at activity of interest to my exam.  Further, the events themselves were clustered 'near' other events of interest, to include indications that a Python script 'compiled' as a Windows executable had been run in order to steal credentials.  So, I sort of figured that this was something worth looking into, so during a team call, I posed the question of, "...has anyone seen this..." to the group, and got a response; one of my teammates pointed me toward Matt Dunwoody's block-parser.py script.  My own research had revealed that FireEye, CrowdStrike, and even Microsoft had talked about these events, and what they mean.

From the FireEye blog post (authored by Matt Dunwoody):

Script block logging records blocks of code as they are executed by the PowerShell engine, thereby capturing the full contents of code executed by an attacker, including scripts and commands. Due to the nature of script block logging, it also records de-obfuscated code as it is executed. For example, in addition to recording the original obfuscated code, script block logging records the decoded commands passed with PowerShell’s -EncodedCommand argument, as well as those obfuscated with XOR, Base64, ROT13, encryption, etc., in addition to the original obfuscated code. Script block logging will not record output from the executed code. Script block logging events are recorded in EID 4104. Script blocks exceeding the maximum length of an event log message are fragmented into multiple entries. A script is available to parse script block logs and reassemble fragmented blocks (see reference 5).

While not available in PowerShell 4.0, PowerShell 5.0 will automatically log code blocks if the block’s contents match on a list of suspicious commands or scripting techniques, even if script block logging is not enabled. These suspicious blocks are logged at the “warning” level in EID 4104, unless script block logging is explicitly disabled. This feature ensures that some forensic data is logged for known-suspicious activity, even if logging is not enabled, but it is not considered to be a security feature by Microsoft. Enabling script block logging will capture all activity, not just blocks considered suspicious by the PowerShell process. This allows investigators to identify the full scope of attacker activity. The blocks that are not considered suspicious will also be logged to EID 4104, but with “verbose” or “information” levels.

While the blog post does not specify what constitutes 'suspicious', this was a pretty good description of what I was seeing.  This Windows Powershell blog post contains information similar to Matt's post, but doesn't use the term "suspicious", instead stating that "PowerShell automatically logs script blocks when they have content often used by malicious scripts."  So, pretty interesting stuff.

After updating my Python installation with the appropriate dependencies (thinks to Jamie for pointing me to an install binary I needed, as well as some further assistance with the script), I ran the following command:

block-parser.py -a -f blocks.txt Microsoft-Windows-Powershell%4Operational.evtx

That's right...you run the script directly against a .evtx file; hence, the need to ensure that you have the right dependencies in place in your Python installation (most of which can be installed easily using 'pip').  The output file "blocks.txt" contains the decoded script blocks, which turned out to be very revealing.  Rather than looking at long strings of clear and encoded text, broken up across multiple events, I could now point to a single, unified file containing the script blocks that had been run at a particular time, really adding context and clarity to my timeline and helping me build a picture of the attacker activity, providing excellent program execution artifacts.  The decoded script blocks contained some 'normal', non-malicious stuff, but also contained code that performed credential theft, and in one case, privilege escalation code, both of which could be found online, verbatim.

It turns out that "Microsoft-Windows-Powershell/4100" events are also very interesting, particularly when they follow an identified 4104 event, as these events can contain error messages indicating that the Powershell script didn't run properly.  This can be critical in determining such things as process execution (the script executed, but did it do so successfully?), as well as the window of compromise of a system.

Again, all of this work was done across what's now up to half a dozen Windows 10 Professional systems.  Many thanks to Kevin for the nudge in the right direction, and to Jamie for her help with the script.

Additional Reading
Matt's DerbyCon 2016 Presentation