Author Topic: Simple demonstration of inline ASM efficiency (Read 2420 times)

Satan911 · « **on:** April 06, 2011, 01:45:15 am »

Simple demonstration of inline ASM efficiency
Comparing decryption time in C versus ASM

Introduction

So I was doing a little assignment for school not so long ago. It was a simple exercise to practice inline ASM by translating a C function into ASM. Took a few minutes and I moved on.. Today I was doing something a lot bigger in ASM and was wondering if programming directly in ASM is more efficient performance wise compared to a high level language like C. I decided to use the code I had from that old exercise to make a small demonstration.

The Code

The code is really simple. The program will decrypt a string encrypted using a Caesar cipher with a shift of 4. So basically to get a 'b' in clear text you'll see 'f' in the encrypted string.

C version: (decrypt_c.c)

Code: [Select]

/*************************************************
 * Author: Satan911
 * Description: Simple demonstration of inline ASM efficiency
 * Date: April 2011
 **************************************************/


#include <stdio.h>

char encrypted_message[25]="Wexer=55$D$Izmp~sri2svk";
char decrypted_message[25];


void decrypt() {
/* decrypted_message[i] = encrypted_message[i] - 4; */
    int i = 0;
    while(encrypted_message[i] != '\0')
    {
      decrypted_message[i] = encrypted_message[i] - 4;
      i++;
    }
}

int main(void) {

    /* To test performance */
    int j = 0;
    while(j < 100000000)
    {
      decrypt();
      j++;
    }

    printf("Encrypted message: \t%s\nDecrypted message: \t%s\n",encrypted_message, decrypted_message);

    return 0;
}

Pastebin (with syntax highlighting): http://pastebin.com/9Up2DrN6

With inline ASM: (decrypt_asm.c) - Might wanna check the Pastebin below for proper indenting

Code: [Select]

/*************************************************
 * Author: Satan911
 * Description: Simple demonstration of inline ASM efficiency
 * Date: April 2011
 **************************************************/

#include <stdio.h>


char encrypted_message[25]="Wexer=55$D$Izmp~sri2svk";
char decrypted_message[25];


void decrypt() {
/* decrypted_message[i] = encrypted_message[i] - 4; */
    asm(
    "xor %ecx, %ecx\n\t"              /* %ecx = 0 (Used as i here) */
    "xor %eax, %eax\n\t"            /* %eax = 0 */
    
    "bouclefor:\n\t"            /*for loop */
    "movb encrypted_message(%ecx), %dl\n\t" /* move encrypted_message[i] in %dl register */
    "cmp %dl, %al\n\t"            /* Compare %dl and %al */
    "je fin\n\t"                /* Jump to fin: if %dl == 0 (end of string) */
    "sub  $4, %dl\n\t"            /* encrypted_message[i] = encrypted_message[i] - 4 */
    "movb %dl, decrypted_message(%ecx)\n\t"    /* decrypted_message[i] = encrypted_message[i] - 4 */
    "incl %ecx\n\t"                /* %ecx += 1 (i++) */
    "jmp bouclefor\n\t"            /* Jump to bouclefor: (while loop in C) */
    
    "fin:\n\t"
    "movb %dl, decrypted_message(%ecx)\n\t" /* This will be used for the last char to move \0 at the end of the string */
    );
}

int main(void) {

    /* To test performance */
    int j = 0;
    while(j < 100000000)
    {
      decrypt();
      j++;
    }

    printf("Encrypted message: \t%s\nDecrypted message: \t%s\n",encrypted_message, decrypted_message);

    return 0;
}

Pastebin (with syntax highlighting): http://pastebin.com/AFAD8AzP

Note: The ASM syntax used here is the AT&T syntax. It works great with GCC and that's also the kind of ASM GCC produces when it compiles a program (Will be used later). Also, the C code could be different but I tried to make it as similar as I could to the ASM code. I think they are pretty identical now.

If you read the code you are probably wondering why I would decrypt() the message 100000000 times. Well it's because this is a really simple decrypting and if you only run it once you won't notice any difference between the C and ASM versions. That's a technique we actually use in software development to check the efficiency of a function over time.

Decryption Time

The time command is used to time a command / program or give resource usage.

So I compiled both versions using the same command and then ran both with time. The results are pretty clear here.. The C version took almost 3x more time to decrypt 100000000 times the message than the ASM version. But why?

I'll try to explain the 'why' a little bit here. First, here's the ASM code generated by GCC for the C version of the program.

# gcc -S -O decrypt_c.c
-S generates the ASM code and -O is for optimized

This is a short version only showing the decrypt() function - See the Pastebin link for the whole code

Code: [Select]

    .file    "decrypt_c.c"
    .text
.globl decrypt
    .type    decrypt, @function
decrypt:
    pushl    %ebp
    movl    %esp, %ebp
    pushl    %ebx
    movzbl    encrypted_message, %edx
    testb    %dl, %dl
    je    .L4
    movl    $0, %eax
    movl    $decrypted_message, %ebx
    movl    $encrypted_message, %ecx
.L3:
    subl    $4, %edx
    movb    %dl, (%ebx,%eax)
    addl    $1, %eax
    movzbl    (%ecx,%eax), %edx
    testb    %dl, %dl
    jne    .L3
.L4:
    popl    %ebx
    popl    %ebp
    ret
    .size    decrypt, .-decrypt
    .section    .rodata.str1.4,"aMS",@progbits,1
    .align 4
.LC0:
    .string    "Encrypted message: \t%s\nDecrypted message: \t%s\n"
    .text

Pastebin: http://pastebin.com/kr9WgnKi

Basically a compiler works this way:
Source code -> ASM code -> Machine code -> Executable

(Of course there are more steps than that but you get the idea)

I won't go through the whole ASM code because it would take a little while but the code generated by GCC (even optimized) is still bigger and a bit more complicated than the code I wrote. Also consider that my ASM code could be even shorter than that but the one you saw is a bit easier to understand.

Conclusion

Even if the compilers we use now are way more efficient than what we had a few years ago, they are still not perfect and a human brain is still more capable of writing short and efficient ASM. Don't get me wrong, there's just no way anyone would code big programs in ASM just for to save a few seconds.. But this whole thread is just a proof of concept to show that indeed it can be interesting to use inline ASM for some functions like the one I showed you.

That's about it. If you have any questions I'll try my best to answer. I tried to make this as clear as I could for anyone to read and understand and I hope you enjoyed it.

gh0st · « **Reply #1 on:** April 06, 2011, 02:11:27 am »

interesting article bro so the resume is:
ASM is equal of powerful than C?

Polynomial · « **Reply #2 on:** April 06, 2011, 02:16:24 am »

As far as I can tell, the primary reason for this being slow in C is that you used a math operation (d[ i ] -= 4) that performs software integer bounds checks. Disassemble the two resulting binaries to see how it worked. Also, did you compile with gcc -O3 on both?

gh0st - I'd say asm is more powerful than C, simply because you can do things like sysenter/sysexit. However, since you can inline asm into C it's probably best to use C for the mostpart.

Tsar · « **Reply #3 on:** April 25, 2011, 10:32:06 am »

Quote from: gh0st on April 06, 2011, 02:11:27 am

interesting article bro so the resume is:
ASM is equal of powerful than C?

Quote from: gh0st on April 06, 2011, 02:11:27 am

interesting article bro so the resume is:
ASM is equal of powerful than C?

Computers are built with layers of abstractions on top of each other.

At the lowest level (for a programmer at least) everything is binary. Using binary strings the processor is given instructions on which transistors to switch (1,0, on/off). These instructions are given a more read able format by assembly language. Essentially an assembly(ASM) instruction represents a string of bits in binary. Now what C compiler does is takes a language and it's set of instructions that the user defines (your code) and compiles it into a list of assembly instructions and then into a binary file for execution. The reason inline ASM can sometimes make things more efficient is because the compiler doesn't always pick the most efficient way, so a human can specify inline assembly to fine-tune what the compiler does.

The Compile Process:

C/C++
|
v
Assembly
|
V
BINARY

Anyways I think I explained that semi-correctly and somewhat clear. Hope that helps.

EvilZone

News:

Author Topic: Simple demonstration of inline ASM efficiency (Read 2420 times)

Satan911

Simple demonstration of inline ASM efficiency

gh0st

Re: Simple demonstration of inline ASM efficiency

Polynomial

Re: Simple demonstration of inline ASM efficiency

Tsar

Re: Simple demonstration of inline ASM efficiency