Analyzing Malware By Example: Part 3GoalsThis time we will take a look at a PE file sample. We want to find out what this sample is capable of, and what it likely used for.
Sample Downloadsample.zipThe password is "infected"
Caution! This is live malware!File FormatYou already know how to find that out. Use the file command, or look at the sample with a hex editor. Both ways should tell you what kind of file we have here (if you don't, read the last tutorials again).
This time we examine a PE file sample. PE stands for Portable Executable format. This format includes Windows EXE, DLL, SYS files, among others. About 80 per cent of all malware file samples have this format. That means having knowledge about this format is very important for a malware analyst. The most important document of any file format is the specification:
http://download.microsoft.com/download/e/b/a/eba1050f-a31d-436b-9281-92cdfeae4b45/pecoff.docNow that you know that this is a PE file, you can search for an appropriate tool or program to view the file. There are lots of free file format viewers for PE files. We will use at least two of them during the next tutorials.
PE Structure VisualizationDownload
PortexAnalyzer.
PortexAnalyzer is a command line tool that allows you to view PE format information. You need Java 7 or higher to run it.
Run the following command:
java -jar PortexAnalyzer.jar -p out.png -o report.txt <sample>
The process to create out.png will take a while. Let it run. You are done when you get this output in the command prompt:
PortEx Analyzer
Creating report file...
Writing header reports...
Writing section reports...
Writing analysis reports...
Done!
picture successfully created and saved to <path-to-out.png>
At first let's take a look at the image that the tool created. It visualizes the content of the PE file as it is on disk.
These are actually three images. The one on the left displays a byteplot, the one in the middle the entropy, the one on the right the PE file structure. We will ignore the byteplot and entropy image for now. That will be part of another tutorial. For this tutorial you learn the basics about the structure of PE files.
A very basic build-up of a PE looks like this:
___________________________
| |
| |
| Headers |
| |
|__________________________|
| |
| |
| |
| Several |
| Sections |
| |
|__________________________|
| |
| Overlay |
|_________________________ |
You have several headers in the beginning. They tell the operating system, how to deal with the file, where to find the contents, what parts of the file must be loaded to memory, and more.
The actual content of the file is in its sections. The sections may contain code, data, imports, exports, resources (like icons), etc.
Everything that is just appended, but not covered by the headers and thus not loaded into memory, is called overlay.
In the image of our sample you can see the location of headers, sections and special contents. MSDOS, COFF, optional header and the section table are the headers. They have a colored background. The sections are shown in different grey tones, they are listed by their names, which are .text, .sdata, .rsrc and .reloc in this case (see legend). Section names are arbitrary, but often they tell you what might be inside them. But take that only as a hint. .rsrc contains per convention the resources (e.g. messages, icons, images), .text the code of the executable, .reloc stands for relocations (if you don't know what that is, don't worry for now). If you match this with the special contents shown by the image, you will see it fits. The green rectangles that stand for resources are indeed in the .rsrc section, one purple rectangle for relocation information is in the .reloc section. Debug information is placed in the .sdata directory.
The entry point is marked by a single red rectangle. It determines the start of code execution, thus, usually points to a section containing code. Our entry point is indeed in the .text section, which confirms that this section contains code.
Note: Malware authors can change the section names to fool you, so don't rely only on them. They are just an indicator that you use to guide your next steps.
The yellow rectangles mark imports. In this case the imports are also in the .text section. One section can have several different contents and sometimes contents that belong together are in several sections. This is quite normal.
To sum up what we found in this picture: This executable has four sections with the first one being the largest on disk. It has at least one import and a large amount of resources. It does not have appended data (so called overlay). The start of code execution is in the first section. The section names seem to match to what is inside them. The file does not look like it is packed (we will cover these kind of files and how to detect them in later tutorials).
PortexAnalyzer ReportNow we will take a look at the actual contents. Open report.txt, which was created by PortexAnalyzer. Don't be overwhelmed by the large amount of information. The PE format is complicated. I will explain the sections of the report briefly and we will take a look at some specific parts that are interesting for now.
You will see several tables in the report. Most of the tables for the headers have a field name, a value, and a file offset. All header fields are sorted by the file offset, which is the physical location of the field within the file on disk. You use this file offset, e.g., if you need to modify the value of a field within a hex editor. The meaning of the value depends on the field. Some are addresses, some are flags, some are signatures, some are a date. The raw value is given within the table, additional explanations or formatting of certain values are above the table.
MSDOS HeaderThe very first part of the report describes the MSDOS Header. It is only relevant for the MSDOS application that is prepended to every PE file and won't be of interest most of the time. There is only one value that the loader uses while reading the file: At file offset 0x3c is the physical address of the PE file signature, which is 'PE\0\0'. PortexAnalyzer describes this value as "PE signature offset", you will also find the term "e_lfanew" in other applications. You can verify this value by opening the file in a hex editor and looking at offset 0x3c.
I marked the value for you. Note the values are in little endian format, so this is indeed the value 0x0080 and not 0x8000.
Now check offset 0x80 and you will find the PE signature there, which is 0x50450000 or "PE\0\0" (note: strings are not little endian).
COFF File HeaderIt starts with details about the values that were found for the time date stamp, the machine type, and the characteristics. These are also the most interesting parts of this header. The time date stamp describes the number of seconds since 00:00 January 1, 1970. The date denotes the time the file was created. Our sample was probably created in August 2013. This field is sometimes used by viruses to mark files as infected. E.g. they might set a future date or a date that is too long in the past that it can actually be correct. In any case, impossible dates are suspicious.
Typical file markers are also deprecated or reserved fields or flags. PortexAnalyzer always mentions if fields are deprecated or reserved. Note: An old file might have deprecated fields, because it is old.
The machine type specifies the architecture this file runs on. 64-bit PE files will not run on 32-bit operating systems. But 32-bit PE files can be executed on both, 32 and 64-bit architectures. Our sample is a 32-bit PE. If it was a 64-bit PE, you would have to take care to choose a compatible operating system for dynamic analysis.
The characteristics confirm the 32-bit architecture.
Characteristics would also tell us if our sample is a dynamic-link library (DLL) or not. For our sample the DLL flag is not set, meaning this is most probably an EXE. What does that mean?
Generally there are two PE file types: EXE and DLL. An EXE file is an application that can be executed on its own. A DLL on the other hand provides data or code for other PE files, e.g., functions that other files can run or images, data and fonts that other files can use. Don't confuse the PE types with their extention. PE file extentions can be: .exe, .dll, .sys, .fon, .drv, and many more. The file extention only gives us a clue about the file's purpose.
DLL files usually have exports. If a PE imports something from a DLL, the contents are loaded dynamically at runtime into the process space of the caller.
Note: The distinction between EXE and DLL is solely a semantic one. Practically files can be hybrids, they may export functions, but also be able to be executed on their own.
Optional HeaderThis header consists of three parts.
The standard fields describe, among others, the address of the entry point. We already saw the location of that entry point in the visualization. The windows fields have a major and a minor operating system version (among others). The version is a minimum requirement. The version of our sample is 4.0 (major.minor), which means you need at least Windows 95 (see:
http://en.wikipedia.org/wiki/List_of_Microsoft_Windows_versions).
Scroll down to the data directory.
The data directory describes where the special contents of the file are, including imports, exports, resources, certificates.
In our sample the data directory specifies the location of the CLR runtime header.
This is an important finding, because it means we have a .NET executable. Before we start to believe this hint, let's look for more clues. If you scroll down to the Imports, you will find mscoree.dll, which is the Microsoft .NET Runtime Execution Engine.
Why is that finding important?
CIL vs Machine Code.NET is a framework for developers that allows interoperability between several languages and runs on different architectures. The code of .NET languages is not compiled to machine code, but to an intermediate representation, called Common Intermediate Language (CIL). The CIL is executed by a virtual machine, the Common Language Runtime (CLR). That allows the program to be independend from the platform or architecture, but only if a VM is available. This concept is quite similar to Java byte code and its execution by the Java Virtual Machine (JVM).
Often, if you compile code, machine code is created and saved in the binary. A lot of information is lost during the compilation process, which makes it impossible to retrieve the original code again. The only reliable way to convert machine code into something human readable is disassembly, a process that creates assembly language.
However, the compilation to intermediate code (CIL or Java bytecode) preserves much more information than compilation to machine code. That makes it possible to use decompilers. A decompiler attempts to reverse compilation, and creates a high-level language representation of the code. The decompiled code is not the original code, because information is still lost. But it is much easier to read and understand than assembly.
That makes analysis of .NET files very easy.
Section TableBefore we start to look for decompilers let's finish our brief overview to the PE report. Look at the Section Table. You see here the section names, their physical location and size, their virtual location and size (which is relevant while the file is loaded to memory), their entropies, and their permissions. Here we can verify that the .text section is the only section that contains executable code. The .sdata section is the only writeable section.
ResourcesThe Resources lists location, size, language, and type of all resources in the file. PortexAnalyzer also applies file type signature scanning to the resources and lists found signatures in the end. Our sample has icons, version information and a manifest. We will take a look at these resources with another tool.
Debug InformationThe debug information is quite interesting here:
Age: 1
GUID: cfeb530e-d299-4abd-a0c5-1546781bb06a
File: C:\Users\Alessandro\Desktop\Force Op Minecraft [Beta]\Force Op Minecraft [Beta]\obj\x86\Debug\Force Op Minecraft [Beta].pdb
The author of this sample is probably not aware of the information he gives away. Hi, Alessandro!
The path is actually the location to the file that contains debug information. It does not only tell us the user name, but also the name of the project and that it was saved in a folder on the desktop.
This project seems to be a hacking tool that shall provide you higher privilege on minecraft servers. I can tell you right on this point that these kind of hacking programs that want you to enter your credentials are most often stealers and not working as intended. The author thinks this project is in Beta. Beta is a development stage of a usable product. You have usually three of them: Alpha, Beta and Release. Alpha is the very first product pre-release that users can test for bugs. Alpha does not contain all of the features that are seen as essential for release.
A Beta version has all features, but still needs to be tested for bugs. The release version is the final version.
So the author of our sample thinks all features are ready.
PEiD SignaturesPEiD was a popular tool for packer and compiler identification. You can still find lots of signature databases for PEiD in the internet. The original database of PEiD is used per default by PortexAnalyzer, but you can use your own.
The signature scanner found a Microsoft Visual C# v7.0 / Basic .NET signature, which confirms again that we have a .NET file.
HashesThe Hash section provides MD5 and SHA256 hashes of the whole file and the sections. It can be worthwhile to look them up in VirusTotal or with Google.
Resource HackerNow let's look at the meat. At this point we will switch to Windows, because the following tools are only available for Windows. Use a VM! Although we just do static analysis for now (not executing the binary), you might accidentally execute it.
And if you think you have this already figured out and it is just a stealer that will do only harm if you enter credentials, you will most probably get an infection at some point. This program might be bundled with other functionality and you cannot know it at this point.
Download ResHacker to take a look at the resources in this file:
http://www.angusj.com/resourcehacker/It is not maintained anymore, but still a very good tool. The alternative tool mentioned on this website is in Beta at the moment. You can try this as well, though.
I won't explain much about this tool. Just try it, look at the icons you find there, the manifest and the version information.
CIL DecompilationDownload the free .NET decompiler ILSpy:
http://ilspy.net/Another CIL decompiler you might already have heard of is .NET Reflector, but you would have to pay for it. We will use only free tools for these tutorials. Also: In my experience ILSpy is much more robust, when it comes to weird files. .NET Reflector provides more features, though.
Open the sample in ILSpy. Look at its buildup.
Usually it is a good idea to search for the entry point, which is the Main method. An experienced eye will see that this sample was created by using a GUI Builder. With a GUI Builder you can place GUI elements on a canvas without writing any code. The GUI Builder generates most of the code for you. The only thing the programmer has to write is the logic that happens after user-interaction, e.g., if buttons are pressed. So instead of searching the Main method, we will look for the Button click methods. Open Force_Op_Minecraft_ -> Form1. You see a listing of variables and methods. Button1_Click up to Button3_Click seem interesting. If you open them you will see that they only enable different timers. Look at Timer1_Click to Timer3_Click. They only update the progress of a progress bar.
Now we have to check that we don't miss anything. On the left side right-click on _Button1 and click on Analyze. This will open the Analyzer window. Check both Read By and Assigned By. The only user created code (not done by GUI Builder) that uses this button is the Button1_Click method and the InitializeComponent method. Check the body of InitializeComponent. You will find the strings that are set for the buttons and labels. There is indeed a textfield for the username and one for a password, suggesting that this should be stealer. Button1 shall "Connect Username". Button2 shall "Connect Server" and the last button does the actual "Hack". But all this application does when pressing these buttons is simulating a progress with each timer tick. It seems this application is unfinished. And it is not malicious, because basic functionality is missing. This is actually a prototype, an application that demonstrates the look and feel that the finished product shall have.
Final ThoughtsThe GUI builder usage, the readily readable code, the wrong understanding of the term "Beta" and the debug information tell us that the author of this code was very inexperienced at the time of creation (this was created 2013, so we cannot tell anything about Alessandro's progress since then).
If you run the sample and take a look at the actual GUI, you will see this:
I actually chose this application for this tutorial, because it is not protected at all. This way we covered some basics about PE files and .NET. The next parts will teach you how to deal with obfuscated samples.