Outline:1. Introduction
2. Main layout of a PE
2.1 MS-DOS stub and PE signature
2.2 COFF File Header
2.3 Optional Header
2.4 The Section Table
2.5 A brief section overview
3. The .rsrc section
1. IntroductionIt started with an idea. I wanted to create a file binder, maybe also a steganography program for .exe files. But the program shouldn't be a sloppy one, it should be able to embed the data in a PE format comformable way, so its manipulation can not be detected. However that is not the only reason to gain knowledge about PE files. If you want to analyse or detect malware, this knowledge will help you a lot. You don't need to know every detail of the PE format if you don't write programs yourself, but you will need a brief overview. Even with the help of PE analysing tools you have to know where to look for your information.
The PE format is descibed in the
Microsoft PE and COFF Specification which is free for download. PE is the acronym for Portable Executable. Those are not only .exe but also .dll files for a 32 or 64 bit architecure. The 64 bit format has some minor differences and is called PE32+ whereas the 32 bit PE is called PE32.
Very old .exe files for 16 bit architecture have the NE (New Executable) format. But this is history now, NE files where used for MS DOS and Windows 1.0 up to Windows 3.x, so let's talk about PE instead.
The PE specification has 97 pages. I will spare you to read everything by giving you this little overview. However--if you want to know details for programming, you will have to use the spec. You will also need a good hex editor to compare your results with the actual data in the file.
2. Main layout of a PEThis is the main layout for a typical .exe file:
2.1 MS-DOS stub and PE signatureAs you can see the first part is only for MS-DOS. The MS-DOS Stub contains code that can run in MS-DOS. The default stub prints out a message "This program cannot be run in DOS mode".
After the stub there is a signature that identifies the file as PE. The signature is "PE\0\0" or translated in hex: 0x50450000. You get the file offset of the PE signature by looking at the value at the offset 0x3c (this offset is always the same). This way you can check, if the file is actually a PE.
Example output of a hex editor. The PE signature is highlighted. You also see the output message of the MS-DOS stub.
If you find the term PE File Header somewhere, there is a definition I took from the spec, so you know what they are talking about:
The PE file header consists of a MS‑DOS stub, the PE signature, the COFF file header, and an optional header
2.2 COFF File HeaderDirectly following the PE signature is the COFF File Header, giving some general information about the file. You can use my
PEAnalyser to see some examples. This tool is a result of my research. By programming a software that can show details of the PE format I made sure to understand it by heart.
You might think that 97 pages of specification are more than enough, but there is still missing information.
COFF File Header example output:----------------
COFF header info
----------------
characteristics:
* Image only, Windows CE, and Windows NT and later.
* Image only.
* Application can handle > 2 GB addresses.
machine type: x64
number of sections: 5
size of optional header: 240
time date stamp: Thu Nov 18 17:08:29 CET 2010
As you can see you get information about architecture this file is made for (machine type), the date and time, the file was created, the number of sections, the size of another header that follows right after as well as some characteristics of the PE. The characteristics are indicated by flags set in a 2-byte value. The meaning of each flag is well explained in the spec.
2.3 Optional HeaderAfter the COFF Header there is the Optional Header. Although it is called "optional" you will find that header in every PE file (it is only optional for object files).
COFF File Header example output:--------------------
Optional header info
--------------------
Standard fields
...............
address of entry point: 188096 (0x2dec0)
address of base of code: 4096 (0x1000)
magic number: 523 --> PE32+ executable
major linker version: 8 (0x8)
minor linker version: 0 (0x0)
size of code: 186368 (0x2d800)
size of initialized data: 105984 (0x19e00)
size of unitialized data: 0 (0x0)
Windows specific fields
.......................
checksum: 0 (0x0)
dll characteristics:
* Terminal Server aware.
file alignment in bytes: 512 (0x200)
image base: 4194304 (0x400000), default for Windows NT, 2000, XP, 95, 98 and Me
loader flags (reserved, must be zero): 0 (0x0)
major image version: 0 (0x0)
major operating system version: 4 (0x4)
major subsystem version: 5 (0x5)
minor image version: 0 (0x0)
minor operating system version: 0 (0x0)
minor subsystem version: 2 (0x2)
number of rva and sizes: 16 (0x10)
section alignment in bytes: 4096 (0x1000)
size of headers (MS DOS stub, PE header, and section headers): 1024 (0x400)
size of heap commit: 4096 (0x1000)
size of heap reserve: 1048576 (0x100000)
size of image in bytes: 307200 (0x4b000)
size of stack commit: 4096 (0x1000)
size of stack reserve: 1048576 (0x100000)
subsystem: The Windows character subsystem
win32 version value (reserved, must be zero): 0 (0x0)
Data directories
................
virtual_address/size
import table: 28(0x1c)/28
resource table: 16(0x10)/16
exception table: 8(0x8)/8
IAT: 192(0xc0)/192
This header contains a lot of information, so that programs reading this file know how to treat it correctly.
There you have a magic number, which tells you if it is a PE32 or PE32+ file. The distinction of PE32 and PE32+ is important for reading the PE, because some header value offsets and sizes are different. You have standard fields and windows specific fields, version numbers, dll characteristics, alignments, sizes and addresses.
2.4 The Section TableThe Section Table (Section Headers) follows after the Optional Header. Sections are the main content of a PE: initialized data, executable code, resources, imports, exports and so on. The Section Table tells you where to find the sections, what size and name they have and which sections are present. It also tells you if the section is writeable, readable or executable (the characteristics of the sections).
Section Table example output:-------------
Section Table
-------------
entry number 1:
...............
characteristics:
* The section contains executable code.
* The section can be executed as code.
name: .text
number of line numbers: 0 (0x0)
number of relocations: 0 (0x0)
pointer to line numbers: 0 (0x0)
pointer to raw data: 1024 (0x400)
pointer to relocations: 0 (0x0)
size of raw data: 186368 (0x2d800)
virtual address: 4096 (0x1000)
virtual size: 185900 (0x2d62c)
entry number 2:
...............
characteristics:
* The section contains initialized data.
name: .rdata
number of line numbers: 0 (0x0)
number of relocations: 0 (0x0)
pointer to line numbers: 0 (0x0)
pointer to raw data: 187392 (0x2dc00)
pointer to relocations: 0 (0x0)
size of raw data: 74752 (0x12400)
virtual address: 192512 (0x2f000)
virtual size: 74282 (0x1222a)
entry number 3:
...............
characteristics:
* The section contains initialized data.
* The section can be written to.
name: .data
number of line numbers: 0 (0x0)
number of relocations: 0 (0x0)
pointer to line numbers: 0 (0x0)
pointer to raw data: 262144 (0x40000)
pointer to relocations: 0 (0x0)
size of raw data: 2560 (0xa00)
virtual address: 270336 (0x42000)
virtual size: 11696 (0x2db0)
entry number 4:
...............
characteristics:
* The section contains initialized data.
name: .pdata
number of line numbers: 0 (0x0)
number of relocations: 0 (0x0)
pointer to line numbers: 0 (0x0)
pointer to raw data: 264704 (0x40a00)
pointer to relocations: 0 (0x0)
size of raw data: 18432 (0x4800)
virtual address: 282624 (0x45000)
virtual size: 17928 (0x4608)
entry number 5:
...............
characteristics:
* The section contains initialized data.
name: .rsrc
number of line numbers: 0 (0x0)
number of relocations: 0 (0x0)
pointer to line numbers: 0 (0x0)
pointer to raw data: 283136 (0x45200)
pointer to relocations: 0 (0x0)
size of raw data: 1024 (0x400)
virtual address: 303104 (0x4a000)
virtual size: 784 (0x310)
In case you wonder why some values are always 0 in this example: number of line numbers, relocations, also pointer to line numbers and relocations are deprecated values. That means they are not used anymore and PE files usually don't set those values.
2.5 A brief section overview.text: executable code
.data: global initialized data
.rdata: global read-only data
.edata: export tables
.idata: import tables
.pdata: exception handling information
.xdata: exception information, free format
.reloc: information for relocation of library files
.rsrc: resources of the executable
.drective: linker options
.bss: uninitialized data, free format
Those are not all possible sections, but the most important ones. In this paper I will only go into detail with the .rsrc section. Others may follow in the next.
3. The .rsrc sectionThis section contains the resources of a PE. Resources are for instance icons, menus, dialogs, version information, font information, but it also might be anything arbitrary. The reason I chose the resource section first, is that this is a good candiate for adding data to the PE file.
The resource section has a multiple-level binary-sorted tree structure. The tree can have up to 2^31 levels, but it is a convention that Windows uses three levels:
The value in each node is either an ID or an address to a String. If you get an ID for the type you can see its meaning here:
http://msdn.microsof...v=vs.85%29.aspx (you won't find that in the PE spec)
Or just use this list:
resource-type:
1: cursor
2: bitmap
3: icon
4: menu
5: dialog box
6: string table entry
7: font directory
8: font
9: accelerator table
10: application defined resource (raw data)
11: message table entry
12: group cursor
14: group icon
16: version information
17: dlginclude
19: plug and play resource
20: VXD
21: animated cursor
22: animated icon
23: HTML
24: side-by-side assembly manifest
The language IDs should be specified somewhere too. I could find some information here:
http://msdn.microsof...y/cc194810.aspxand here:
http://www.science.c...ocale-Codes.aspThere you also see an example for a tree in the resource section (this image is from the first link):
If you want to see more examples or even edit resources (like changing icons or dialog messages of .exe files), you can download a free resource editor, like
Resource Hacker. I wasn't able to find one for Linux. The reason might be, that Windows provides an API which enables editing of PE resources. I couldn't find something similar for Linux. If you know anything, please tell me.
Here is an example with Resource Hacker and the Minecraft.exe:
You can see the three levels here. The first is the type, i.e. "Icon Group" for the example that I show in the image. "Icon Group" is not saved as a string, but as ID "14" in this case. Resource Hacker matches the IDs to their meanings. The name is 1. The language is 1024.
The leaves of the tree in the resource section have the address for the actual resource data. The format of the raw data depends on the resource type. You can find information about the formats here:
http://msdn.microsof...v=vs.85%29.aspxThe tree leaves is where a mistake in the specification took me two days of work to find out. All the addresses that the tree itself used to point to the next node are so called RVA (relative virtual addresses). That means they are
not the offset within the file, but the offset within the current section. I just took the "pointer to raw data" value in the Section Table to determine the beginning of the .rsrc section (this pointer is an absolute address) and added the RVA value of the current node to get to the next node. The leaves also have a value called Data RVA in the specification that shall point to the resource data. But the value I got there was even bigger than the file itself. It couldn't be an RVA then.
Of course the first thing you do is looking for mistakes in the code and re-reading the specification over and over. This took the time of the first day. Since the code got all the other values right, I suspected, that there is something else to do. I used my Hex-Editor and searched for the correct address of the resource. I knew the content of the resource by Resource Hacker, but that program didn't tell me the address. Then I tried arbitrary things to find out how this large value can be turned to the correct address.
It took me the second day to finally get the correct solution: The so called Data RVA is an absolute VA (virtual address). The Section Table also holds a virtual address for the beginning of the .rsrc section. I computed the relative virtual address (the address I actually use to get to the resource data) by substracting the virtual address of the rsrc section from the so called "Data RVA":
address = rsrc_section_virtual_address - data_RVAThis address is used like all the other RVA values for moving through the tree nodes.
That is all you need to know about this section by now. If you have any questions, just tell me.
Deque