Simple demonstration of inline ASM efficiency Comparing decryption time in C versus ASMIntroductionSo I was doing a little assignment for school not so long ago. It was a simple exercise to practice inline ASM by
translating a C function into ASM. Took a few minutes and I moved on.. Today I was doing something a lot bigger in ASM and was wondering if programming directly in ASM is more efficient performance wise compared to a high level language like C. I decided to use the code I had from that old exercise to make a small demonstration.
The Code
The code is really simple. The program will decrypt a string encrypted using a Caesar cipher with a shift of 4. So basically to get a 'b' in clear text you'll see 'f' in the encrypted string.
C version: (decrypt_c.c)
/*************************************************
* Author: Satan911
* Description: Simple demonstration of inline ASM efficiency
* Date: April 2011
**************************************************/
#include <stdio.h>
char encrypted_message[25]="Wexer=55$D$Izmp~sri2svk";
char decrypted_message[25];
void decrypt() {
/* decrypted_message[i] = encrypted_message[i] - 4; */
int i = 0;
while(encrypted_message[i] != '\0')
{
decrypted_message[i] = encrypted_message[i] - 4;
i++;
}
}
int main(void) {
/* To test performance */
int j = 0;
while(j < 100000000)
{
decrypt();
j++;
}
printf("Encrypted message: \t%s\nDecrypted message: \t%s\n",encrypted_message, decrypted_message);
return 0;
}
Pastebin (with syntax highlighting):
http://pastebin.com/9Up2DrN6With inline ASM: (decrypt_asm.c) -
Might wanna check the Pastebin below for proper indenting/*************************************************
* Author: Satan911
* Description: Simple demonstration of inline ASM efficiency
* Date: April 2011
**************************************************/
#include <stdio.h>
char encrypted_message[25]="Wexer=55$D$Izmp~sri2svk";
char decrypted_message[25];
void decrypt() {
/* decrypted_message[i] = encrypted_message[i] - 4; */
asm(
"xor %ecx, %ecx\n\t" /* %ecx = 0 (Used as i here) */
"xor %eax, %eax\n\t" /* %eax = 0 */
"bouclefor:\n\t" /*for loop */
"movb encrypted_message(%ecx), %dl\n\t" /* move encrypted_message[i] in %dl register */
"cmp %dl, %al\n\t" /* Compare %dl and %al */
"je fin\n\t" /* Jump to fin: if %dl == 0 (end of string) */
"sub $4, %dl\n\t" /* encrypted_message[i] = encrypted_message[i] - 4 */
"movb %dl, decrypted_message(%ecx)\n\t" /* decrypted_message[i] = encrypted_message[i] - 4 */
"incl %ecx\n\t" /* %ecx += 1 (i++) */
"jmp bouclefor\n\t" /* Jump to bouclefor: (while loop in C) */
"fin:\n\t"
"movb %dl, decrypted_message(%ecx)\n\t" /* This will be used for the last char to move \0 at the end of the string */
);
}
int main(void) {
/* To test performance */
int j = 0;
while(j < 100000000)
{
decrypt();
j++;
}
printf("Encrypted message: \t%s\nDecrypted message: \t%s\n",encrypted_message, decrypted_message);
return 0;
}
Pastebin (with syntax highlighting):
http://pastebin.com/AFAD8AzPNote: The ASM syntax used here is the AT&T syntax. It works great with GCC and that's also the kind of ASM GCC produces when it compiles a program (Will be used later). Also, the C code could be different but I tried to make it as similar as I could to the ASM code. I think they are pretty identical now. If you read the code you are probably wondering why I would decrypt() the message 100000000 times. Well it's because this is a really simple decrypting and if you only run it once you won't notice any difference between the C and ASM versions. That's a technique we actually use in software development to check the efficiency of a function over time.
Decryption TimeThe
time command is used to time a command / program or give resource usage.
So I compiled both versions using the same command and then ran both with time. The results are pretty clear here.. The C version took almost 3x more time to decrypt 100000000 times the message than the ASM version. But why?
I'll try to explain the 'why' a little bit here. First, here's the ASM code generated by GCC for the C version of the program.
# gcc -S -O decrypt_c.c
-S generates the ASM code and -O is for optimized
This is a short version only showing the decrypt() function - See the Pastebin link for the whole code .file "decrypt_c.c"
.text
.globl decrypt
.type decrypt, @function
decrypt:
pushl %ebp
movl %esp, %ebp
pushl %ebx
movzbl encrypted_message, %edx
testb %dl, %dl
je .L4
movl $0, %eax
movl $decrypted_message, %ebx
movl $encrypted_message, %ecx
.L3:
subl $4, %edx
movb %dl, (%ebx,%eax)
addl $1, %eax
movzbl (%ecx,%eax), %edx
testb %dl, %dl
jne .L3
.L4:
popl %ebx
popl %ebp
ret
.size decrypt, .-decrypt
.section .rodata.str1.4,"aMS",@progbits,1
.align 4
.LC0:
.string "Encrypted message: \t%s\nDecrypted message: \t%s\n"
.text
Pastebin:
http://pastebin.com/kr9WgnKiBasically a compiler works this way:
Source code -> ASM code -> Machine code -> Executable
(Of course there are more steps than that but you get the idea)
I won't go through the whole ASM code because it would take a little while but the code generated by GCC (even optimized) is still bigger and a bit more complicated than the code I wrote. Also consider that my ASM code could be even shorter than that but the one you saw is a bit easier to understand.
ConclusionEven if the compilers we use now are way more efficient than what we had a few years ago, they are still not perfect and a human brain is still more capable of writing short and efficient ASM. Don't get me wrong, there's just no way anyone would code big programs in ASM just for to save a few seconds.. But this whole thread is just a proof of concept to show that indeed it can be interesting to use inline ASM for some functions like the one I showed you.
That's about it. If you have any questions I'll try my best to answer. I tried to make this as clear as I could for anyone to read and understand and I hope you enjoyed it.