Analyzing Malware by Example---Sample 2Goal:Our goal is again to find out what kind of malware it is in regard to its behaviour. The last sample was a downloader. Let's see what we have this time.
Download the sample:download sample.zipThe password is "infected".
Caution! This is live malware!Your first step is again to check the file type, so you know which tools are appropriate for further analysis. You already learned some techniques in the last preparatory exercise. Find out the file type first, then go on reading here.
You probably used TrID, the file command on Linux or your hex editor. If you found out that this is a Word document, you got it right.
Install PythonFirst install Python 2.7 on your static analysis system. You will need it for a bunch of analysis tools. If you have Linux, Python is probably preinstalled. You can check if it is preinstalled by typing python on the command line. If the interactive interpreter opens, you are done. If you get a message saying that the python command could not be found, you need to set it up.
Download Python 2.7. from
here, scroll to the Download section, choose the download for your Operating System. E.g. if you use Windows 32-bit, choose
Windows x86 MSI Installer (2.7), for 64-bit choose
Windows X86-64 MSI Installer (2.7). Run the installer and follow the prompts. After that check again if the python command works. If it doesn't, you need to add Python to the PATH variable.
- Right click on My Computer
- Click on Properties
- Click Advanced Settings, then Environmental Variables
- A list of variables will appear. In the upper list choose PATH and modify.
- At the end of the path, add the following: ;C:\WINDOWS\system32;C:\WINDOWS;C:\Python27
- Click OK
Now open your command prompt. If you had one open previously, you need to close it and open again.
Type
python to verify that everything is correct. Type
exit() to leave the interactive interpreter.
Extracting MetadataBefore you move to Macro code extraction, get an overview of the file first. Most file types have so called metadata. That is additional information saved in the file which is usually used by programs that process files of this type.
Download the most recent ZIP of oletools from here:
https://bitbucket.org/decalage/oletools/downloadsThese are python tools, which you use from command line. A description of each tool is here:
http://www.decalage.info/en/book/export/html/79Install oletools on your system. If you are on Windows, execute
install.bat.
Open your command prompt and verify that we have identified the correct file type of our sample with:
python oleid.py <path-to-sample>
The program will print the following information:
+-------------------------------+-----------------------+
| Indicator | Value |
+-------------------------------+-----------------------+
| OLE format | True |
| Has SummaryInformation stream | True |
| Application name | Microsoft Office Word |
| Encrypted | False |
| Word Document | True |
| VBA Macros | True |
| Excel Workbook | False |
| PowerPoint Presentation | False |
| Visio Drawing | False |
| ObjectPool | False |
| Flash objects | 0 |
+-------------------------------+-----------------------+
Oleid confirms that this was created by MS Word. It also tells us that the file has the
OLE format.
Oleid also tells us that the document is not encrypted and that it contains VBA Macros. So extracting the Macros is one possible way to analyse this file, but we should do the most easy things first.
If you are on Linux, execute the file command. It is able to show metadata for some file formats, including MS Office documents.
file <path-to-sample>
Your output will look like this:
238bd6216c533984173a80c5675bd76f18100ec2c0cf462e24fe82d28305a674: Composite Document File V2 Document, Little Endian, Os: Windows, Version 5.1, Code page: 1251, Author: Admin, Template: Normal.dotm, Last Saved By: Normal.d aka punsh, Revision Number: 21, Name of Creating Application: Microsoft Office Word, Total Editing Time: 19:00, Create Time/Date: Fri Jul 25 17:42:00 2014, Last Saved Time/Date: Tue Sep 2 17:41:00 2014, Number of Pages: 44, Number of Words: 36773, Number of Characters: 209612, Security: 0
On Windows and Linux use olemeta.py.
python olemeta.py <path-to-sample> > metadata.txt
The part > metadata.txt will write the output of the command into a textfile, so you can handle it better.
The output is quite long in this case.
I will shorten it here, so we can discuss some parts of the data first.
Properties from SummaryInformation stream:
- codepage: 1251
- title: ''
- subject: ''
- author: 'Admin'
- keywords: ''
- comments: None
- template: 'Normal.dotm'
- last_saved_by: '\xc2\xeb\xe0\xe4\xe8\xec\xe8\xf0 aka punsh'
- revision_number: '21'
- total_edit_time: 1140L
- last_printed: None
- create_time: datetime.datetime(2014, 7, 25, 16, 42)
- last_saved_time: datetime.datetime(2014, 9, 2, 16, 41)
- num_pages: 44
- num_words: 36773
- num_chars: 209612
- creating_application: 'Microsoft Office Word'
- security: 0
Properties from DocumentSummaryInformation stream:
- codepage_doc: 1251
- lines: 1746
- paragraphs: 491
- scale_crop: False
- company: 'Microsoft'
- links_dirty: False
- chars_with_spaces: 245894
- shared_doc: False
- hlinks: None
- hlinks_changed: False
- version: 786432
Research is the Key, Interpreting the MetadataThe very first part shown by the Linux file command
Composite Document File V2 Document is actually the file type. You can find more about it here:
https://msdn.microsoft.com/en-us/library/dd942138.aspxWhen you do any research like this prefer reading the documentation that was made by the organization or person who created the file type over reading third-party explanations. Usually they are more accurate. Sometimes there are exceptions, though.
Very detailed information about the file format can be found in its specification: download.microsoft.com/download/0/B/E/0BE8BDD7-E5E8-422A-ABFD-4342ED7AD886/WindowsCompoundBinaryFileFormatSpecification.pdf
This is usually only interesting for people who develop software that has to parse the format. But since you are on the way to becoming a reverse engineer, file format specifications have useful information for you too.
Little Endian (only shown by file) is a way of storing the binaries so that the least significant byte is stored in the smallest address. It is common for Microsoft Windows formats.
The
Windows version (only shown by file) is given as 5.1. Microsoft pages are again good idea to look for more information. See:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms724832%28v=vs.85%29.aspxSo this file was compiled for Windows XP or above. This tells you, which operating systems might be targeted by the malware.
The
code page (codepage_doc in olemeta) is listed as 1251. Again, use google and you will find that 1251 is a character encoding known as
Cyrillic script, which is used for Russian, Bulgarian, Serbian Cyrillic and others.
The
author is listed as "
Admin", which is probably the user name. This does not tell us much in this case, nor does the
template "
Normal.dotm".
The
Last Saved By value "
Normal.d aka punsh" however is useful, because it is a very specific name. You can use this details in connection with other malware. E.g. if you find another document with the entry "
Last Saved By: Normal.d aka punsh", it is most probably written by the same author. Google for the string and you will find several hash entries for files that have the same string in it. How is that useful? If you write signatures for antivirus products, you want to kill all birds with one stone. You want a signature that covers lots of variants of the same type to keep the antivirus database small and the performance good. And of course a smart and lazy person will prefer to have less work to do. So this information is already enough to collect variants.
Revision,
editing time and
creation time help to check if that is a newer or older version of the same malware. If you have several samples, you can create a time-line of changes.
Last but not least there are some
statistics about the file, which are probably self-explanatory. It gives you an idea of the document's size. If you get a file like this and recognize that it has 44 MS Word pages with a total editing time of 19 minutes, it is for sure suspicious.
This is actually a lot of information we got with so little effort.
Keep in mind that you won't have to google so much about the basics anymore, once you got familiar with the output.
Easy Things FirstDownload a hex editor, if you haven't already. It does not really matter which one, e.g.
HxD is nice.
Open the sample in your hex editor and browse through it. It is one of the first things I always do, often you will find some very interesting things there, which prevent a lot of additional work in the long run.
At offset 0xc6c you will find the start of an area that looks like this:
Firstly: the upper part shows some part of the message that the user also sees if he or she opens the document. It is written in German. You could use google translate, but since this happens to be my mothertongue, let me translate it for you: "Click on Macro, choose security and click on Low"
These are the instructions for the user to turn on VBA Macros. The malware makes the user curious by telling that he or she can only see the full content of the document, if he does so. This social engineering trick enables the document to execute its malicious code.
-----------------------------------------------------------------------------------------------------------------------------------------------
Note: If you ever encounter this in a sample, don't feel tempted to open the document in an unsafe environment. You might think it is safe, if you don't turn on Macros. But malware authors are bitches and they will bite the reverse engineers that don't take proper care. Sometimes they will throw in strings just to lure the malware analysts into doing stupid things or to keep on the wrong track.
At this point you only have assumptions, which you will verify with each additional step. Never let your guard down. Always assume the worst, which is in this case that the malware might work even with Macros disabled.
-----------------------------------------------------------------------------------------------------------------------------------------------
Now look at the lower part
There is a pattern of '&H??' with '?' being 0-9 or A-F. These are hex values! Let's decode them to text. Copy the whole area with this pattern to a text editor. I recommend Notepad++ if you are on Windows. Use the
replace function of your editor to remove the "&H" parts (by replacing with nothing). Now go ahead and either write a decoder for the hex string or use a decoding tool. E.g. Notepad++ has a plugin for hex to ASCII conversion that you can use. Save the resulting file.
You probably already noticed that the beginning of the dumped file looked like this:
Once you got a bit familiar with reverse engineering, you will notice this soon as the beginning of a
Portable Executable (PE) file. A PE file has a full MS DOS application at the beginning of it that prints the standard message "This file cannot be run in DOS mode!". This was done out of compatibility reasons. The 'MZ' is the magic number for a DOS executable.
The actual magic number for the PE ('PE\0\0') comes later and is given within the header of the DOS stub.
Of course you can also run the "file" command or TrID on the dumped file.
The
malware type of our Word sample is a
dropper. In contrast to a downloader it already carries the malware with it, writes it to disk and runs it upon execution. It might also perform steps to persist the dropped file in the system, e.g., adding it to an autostart entry in the registry. This dropper carries its PE malware as
hex string.
If you look at the code with olevba.py, you can find out more about the inner workings of the Word file. Now that we have a good idea about the file's behaviour it will be much easier to interpret the code. I leave that to you as an exercise. Verify the things we already found. Try to find the location where it drops the file and the part of the VBA code that extracts and decodes the hex string. How does the malware locate the beginning of the hex string?
The dropped file was a PE file. We will dive into PE analysis within the next tutorials.
Meanwhile you can look up the SHA256 of the dumped file on VirusTotal or other analysis sites; or upload the file to a sandbox analysis system, e.g.:
https://www.hybrid-analysis.com/https://malwr.com/Stay safe!