Author Topic: Translating bytes into assembly  (Read 2454 times)

0 Members and 1 Guest are viewing this topic.

Offline seci

  • Serf
  • *
  • Posts: 22
  • Cookies: 8
  • Seci :D
    • View Profile
    • www.evilzone.org
Translating bytes into assembly
« on: May 23, 2011, 12:44:25 am »
This is a continued discussion of http://evilzone.org/projects-and-discussion/calculating-entrypoint-and-mapping-it-to-byte-array/, which I will be updating soon with the information I have learned. I now have managed to read all of the PE headers, calculating the entrypoint and finding the bytes at the entrypoint.

Now I want to translate these bytes into something meaningful! So, how on earth do I do this? If someone could point me to tutorials, guides, books(not to big :() I would be very grateful!

I have a rough understanding of it, an instruction can be anything from ~2 bytes to like 14 or 17 or something. And the length of the instruction is determined by the early bytes in the instruction etc. But there is no good tables/descriptions etc for how to actually do this. I tried looking at the Intel opcode instruction reference PDF, but its like 1500 pages of nonsense :(
6b619af0d7042db45f3e215b3dd7b977e8d1c82f

Offline Tsar

  • Peasant
  • *
  • Posts: 126
  • Cookies: 10
  • turing-recognizable
    • View Profile
Re: Translating bytes into assembly
« Reply #1 on: May 23, 2011, 03:40:13 am »
I have a rough understanding of it, an instruction can be anything from ~2 bytes to like 14 or 17 or something. And the length of the instruction is determined by the early bytes in the instruction etc. But there is no good tables/descriptions etc for how to actually do this. I tried looking at the Intel opcode instruction reference PDF, but its like 1500 pages of nonsense :(

You are going to need some kind of list of all the instructions and their respective binary/hex versions, ca0s made a "ASM to Hex" program that you could use if you don't have the hex/binary.

As for the part I underlined, this sound's like you will need something similar to Huffman's algorithm.
http://en.wikipedia.org/wiki/Huffman_coding
http://www.cs.duke.edu/csed/poop/huff/info/

Essentially it follows a tree to see if it is proceeded by anything. Each node could have the associated instruction per opcode, and you know if it is an instruction if it gets to a node with no children.

I'm not going to lie it will have to be pretty complex, but I think it is do-able for sure with enough effort.

iMorg

  • Guest
Re: Translating bytes into assembly
« Reply #2 on: May 23, 2011, 03:52:17 am »
Create something of a "reverse" assembler.

Take large chunks of the code(not the whole file at once), decode the opcodes into commands based on the target instruction set.

Tokenization could make the process go faster if you tokenize the opcodes before decoding them.

Offline ca0s

  • VIP
  • Sir
  • *
  • Posts: 432
  • Cookies: 53
    • View Profile
    • ka0labs #
Re: Translating bytes into assembly
« Reply #3 on: May 23, 2011, 06:12:27 pm »
Quote
Create something of a "reverse" assembler.

Dissassembler? :P
You are going to have a DB in your program containing every possible opcode and its hexa value, and also, how many bytes takes as instruction argument. I tought about doing something like that time ago, but it is a hard work.

Offline Huntondoom

  • Baron
  • ****
  • Posts: 856
  • Cookies: 17
  • Visual C# programmer
    • View Profile
Re: Translating bytes into assembly
« Reply #4 on: May 23, 2011, 08:37:12 pm »
if the instruction line are 2 - 14 long
then try
the total amount of bytes - (the number of bytes that are used for setting the length)
then divide by 2 / 14 and see which gives a good number
since the length of those instruction line are set to a certain number
then the length of the total amount of bytes should be in the range of the Multiplication table of that number
but you have to exclude the first bytes from the total amount since they set the whole length
at least this is what I think
I have no further knowledge of this stuff :P
Aslong as you are connected to the internet, you'll have no privacy

Advanced Internet Search
Clean Up!

Offline seci

  • Serf
  • *
  • Posts: 22
  • Cookies: 8
  • Seci :D
    • View Profile
    • www.evilzone.org
Re: Translating bytes into assembly
« Reply #5 on: May 23, 2011, 08:55:03 pm »
if the instruction line are 2 - 14 long
then try
the total amount of bytes - (the number of bytes that are used for setting the length)
then divide by 2 / 14 and see which gives a good number
since the length of those instruction line are set to a certain number
then the length of the total amount of bytes should be in the range of the Multiplication table of that number
but you have to exclude the first bytes from the total amount since they set the whole length
at least this is what I think
I have no further knowledge of this stuff :P

If it only was that simple :(

Basically, there is no way knowing from reading it anywhere how long each instruction is. You have to judge it by the byte values.
First there is a instruction prefix that can be anything from 0 to 4 bytes(optional), this prefix modifies the function of the instruction, which is the coming 1 or 2 bytes(required). Then there is another 1 byte(optional) that does modification to the way the instruction work. Then another 1 byte that does more modification of the instruction(optional). Then there is something called displacement(optional) which I am not 100% what is, I am reading this right now. Then there is the last bytes which is immediate data, 1 to 4 bytes(optional)

In total there is 1 byte minimum and 16 bytes maximum for each instruction. More on this later.


I have been reading tons and tons of research the past days, learned a lot. I will maybe write a rather large guide/tutorial or whatever on PE/COFF, reading headers and starting reversing of PE's from bottom up and close up the two topics I have about this to now.
« Last Edit: May 23, 2011, 09:02:22 pm by seci »
6b619af0d7042db45f3e215b3dd7b977e8d1c82f

Offline Tsar

  • Peasant
  • *
  • Posts: 126
  • Cookies: 10
  • turing-recognizable
    • View Profile
Re: Translating bytes into assembly
« Reply #6 on: May 23, 2011, 10:36:31 pm »
Nothing to add that I haven't already, but I thought I would mention that last night I had dreams about trying to solve this problem for hours. lol

iMorg

  • Guest
Re: Translating bytes into assembly
« Reply #7 on: May 24, 2011, 01:28:42 am »

Dissassembler? :P
You are going to have a DB in your program containing every possible opcode and its hexa value, and also, how many bytes takes as instruction argument. I tought about doing something like that time ago, but it is a hard work.

Wow. I must have been really tired when posting that lmao.

Its similar to writing an emulator(not sure if you have or not), you already have the file loaded now you just need to decode the opcodes into their correct instructions. Checkout QEMU, its open source and has some sophisticated instruction decoding chunks in it.