[ASM] Number printing algorithm

Recommended Posts

Decrius    100
So, been tinkering with ASM a bit, and tried writing an ASM procedure that prints out readable numbers (especially ones bigger or equal to 10). It works :), I'm happy with that, but I still want to post it here and read what you guys think of it. This is the first program I wrote while not being guided by some example. I tried following the _cdecl convention. The only think I didn't grasp is why they decrease the stack pointer (esp) by 4 * number_of_local_variables before going into the actual procedure...in this code the number of local variables is unknown...I just decrease the stack pointer whenever I figure I need a new variable. Is that wrong? Also, the base pointer (ebp) is remembered, but wouldn't it also be possible not using the ebp at all? Just relying on the esp (where you very much watch for pushes or pops on the way) would do just the same trick, doesn't it? Anyways, here's the code, thanks for any comments :-)
global _main
extern _putchar

section .text

_main:
push 789
call _print_number  ; void print_number(unsigned int i);

ret

_print_number:          ; print out readable numbers
push ebp
mov  ebp, esp

push eax            ; save registers
push ebx
push ecx
push edx

mov eax, [ebp + 8]  ; take argument
mov ebx, 10         ; divider is 10
xor ecx, ecx        ; counter is 0

cmp eax, 10
jge print_number_push_remainder
jmp print_number_put_number

print_number_push_remainder:
xor edx, edx        ; high significant bytes are 0, low significant bytes are in eax
div ebx             ; divide by 10

push edx            ; push remainder
inc ecx

cmp eax, 10         ; keep going until quotient (eax) is smaller then 10
jge print_number_push_remainder

print_number_put_number:
add eax, 48         ; '0' is at place 48 in ASCII

push ecx            ; remember counter
push eax            ; print out quotient (or argument if is lower then 10), highest significant number of argument
call _putchar
pop ecx

cmp ecx, 0
je print_number_exit

print_number_put_number_loop:
pop eax             ; pop remainder in reverse order
dec ecx

add eax, 48         ; '0' is at place 48 in ASCII

push ecx            ; remember counter
push eax            ; print out remainder, lower significant number then the previous print
call _putchar
pop ecx

cmp ecx, 0          ; keep printing until we had all pushed remainders
jne print_number_put_number_loop

print_number_exit:
pop edx             ; restore registers
pop ecx
pop ebx
pop eax

leave
ret



Share on other sites
Nypyren    12063
Your EBP question is a good one. Yes, compilers know how to just rely on the ESP. In MSVC the option for this is called "Omit Base Pointer". I'm going to use MSVC6 as the example, since each compiler can do things slightly differently if they want.

MSVC does this automatically by tracing its control flow graph. Each node (called a 'basic block') in the graph is a sequence of instructions which start with a label (function start or jump destination) and do not contain branches until the last instruction. Since the instructions inside a node must always be executed in order, you can compute the stack offset difference from the start to the end of the node, and it will always be the same.

The compiler generates code so that no matter how the control flow graph is executed, the stack offset will always be the same value when entering a particular node, AND that the stack offset is back at zero at the exit of any 'return' block.

To simplify this job, the compiler puts ALL local variables in one clump at the beginning (sub esp, ###), then only moves the stack for function call arguments from that point on. Usually the compiler will not use the stack for anything else (like your digits in your number-printer).

The compiler can also use 'lazy' stack management. If it calls two functions in a row, and is using a calling convention where the caller pops arguments, it will wait until the last possible moment before popping everything (with a single 'add esp, ###' instruction). The 'last possible moment' in this case is at the end of a basic block which leads into a successor block with a smaller stack offset.

For your particluar algorithm, you can find the upper limit to the number of digits you print out. The maximum value of a 32-bit number is 4294967295. That's a maximum of 10 digits of storage you would need to set aside (you can print the negative sign first without wasting a local variable on it). You could just allocate a fixed number of local variables on the stack for that many digits and you wouldn't need to push anything extra on the stack during your algorithm.

Share on other sites
Decrius    100
Thanks, useful response :-)

Quote:
 Original post by NypyrenThe compiler can also use 'lazy' stack management. If it calls two functions in a row, and is using a calling convention where the caller pops arguments, it will wait until the last possible moment before popping everything (with a single 'add esp, ###' instruction). The 'last possible moment' in this case is at the end of a basic block which leads into a successor block with a smaller stack offset.

Yes, exactly, that's what I was tinkering with too ;-), I noticed how you can keep parameters pushed on the stack when using my function for example, something like:

push 1234call _print_numberpush 5678call _print_numberadd esp, 8

Quote:
 Original post by NypyrenFor your particluar algorithm, you can find the upper limit to the number of digits you print out. The maximum value of a 32-bit number is 4294967295. That's a maximum of 10 digits of storage you would need to set aside (you can print the negative sign first without wasting a local variable on it). You could just allocate a fixed number of local variables on the stack for that many digits and you wouldn't need to push anything extra on the stack during your algorithm.

Nice one, is the performance profit here that we subtract from the esp only once, and not per number? In which case I can also instead of popping, just moving the data and at the end add 10 * 4 to the esp.

Also, I could also not add to the esp, since 'leave' includes 'mov esp, ebp', so whatever I do with the esp, I can restore it with one mov.

Beside that, is adding or subtracting to and from the esp only simple arithmetics, or does subtracting from the esp also have effects like 'reserving
space' (whatever that may include, perhaps looking for a new segment since the current one is full?).

Share on other sites
Nypyren    12063
Quote:
 Original post by DecriusNice one, is the performance profit here that we subtract from the esp only once, and not per number? In which case I can also instead of popping, just moving the data and at the end add 10 * 4 to the esp.

Right, it's just an optimization for # of instructions. Just make sure your stack offset is consistent at each of your labels! :)

Quote:
 Also, I could also not add to the esp, since 'leave' includes 'mov esp, ebp', so whatever I do with the esp, I can restore it with one mov.

This is true in your case. However, when the MSVC compiler uses the 'omit base pointer' option, this allows it to use EBP as a general-purpose register. This lets it leave more data in registers longer instead of accessing memory. Registers are WAY faster than memory access.

Since the EBP isn't saved in this case, the compiler has to manually put ESP "back where it found it" before it returns. Whenever EBP is used as a general purpose register, the compiler can't use the ENTER or LEAVE instructions.

Also, some fairly simplistic debuggers and call-stack viewers try to show you the call stack by walking down the stack using EBP pointers (since each EBP pointer typically points to the next one down the stack if it's used to save stack frames). Omitting EBP can screw up these debuggers, but most good debuggers don't care about EBP and can brute-force the stack looking for pointers that fall within the range of addresses where code lives.

Quote:
 Beside that, is adding or subtracting to and from the esp only simple arithmetics, or does subtracting from the esp also have effects like 'reservingspace' (whatever that may include, perhaps looking for a new segment since the current one is full?).

All the processor does is change the register's value and tweak some flags (like ZF). The idea of 'reserving space' is something that the programmers agree on. Theoretically you could make a language that doesn't use a stack and use the ESP register for something totally different.

As far as stack overflows go, the x86 lets you mark different chunks (Pages) of memory with different 'protection' (Read, Write, Execute, etc). There is an additional type called a 'Guard' which does something special: Any time code tries to read or write that memory, it raises a special hardware exception. Operating systems (at least Windows NT kernels) use this feature to detect overflows, assuming that any access to that memory was caused by the stack growing too large.

Share on other sites
Decrius    100
Thanks Nypyren ;-)

I just finished Fibonacci-numbers with it:

extern _putcharglobal _print_charglobal _print_numbersection .text_print_char:             ; void print_char(unsigned char);    push ebp    mov  ebp, esp    sub esp, 16          ; 1 local variable + 3 registers = 16    mov [esp + 12], eax  ; save registers    mov [esp + 8], ecx    mov [esp + 4], edx    mov eax, [ebp + 8]    mov [esp + 0], eax    call _putchar    mov edx, [esp + 4]   ; restore registers    mov ecx, [esp + 8]    mov eax, [esp + 12]    leave    ret_print_number:          ; void print_number(unsigned int);    push ebp    mov  ebp, esp    sub esp, 56         ; 10 local variables (max. 10 positions for a 32-bit number) + 4 registers = 56    mov [esp + 52], eax ; save registers    mov [esp + 48], ebx    mov [esp + 44], ecx    mov [esp + 40], edx    mov eax, [ebp + 8]  ; take argument    mov ebx, 10         ; divider is 10    xor ecx, ecx        ; counter is 0    cmp eax, 10    jge print_number_push_remainder    jmp print_number_put_numberprint_number_push_remainder:    xor edx, edx        ; high significant bytes are 0, low significant bytes are in eax    div ebx             ; divide by 10    mov [esp + (4 * ecx)], edx ; store remainder    inc ecx    cmp eax, 10         ; keep going until quotient (eax) is smaller then 10    jge print_number_push_remainderprint_number_put_number:    add eax, 48         ; '0' is at place 48 in ASCII    push ecx            ; remember counter    push eax            ; print out quotient (or argument if is lower then 10), highest significant number of argument    call _putchar    add esp, 4    pop ecx    cmp ecx, 0    je print_number_exitprint_number_put_number_loop:    dec ecx    mov eax, [esp + (4 * ecx)] ; pop remainder in reverse order    add eax, 48         ; '0' is at place 48 in ASCII    push ecx            ; remember counter    push eax            ; print out remainder, lower significant number then the previous print    call _putchar    add esp, 4    pop ecx    cmp ecx, 0          ; keep printing until we had all pushed remainders    jne print_number_put_number_loopprint_number_exit:    mov edx, [esp + 40] ; restore registers    mov ecx, [esp + 44]    mov ebx, [esp + 48]    mov eax, [esp + 52]    leave    ret

global _mainextern _print_charextern _print_numbersection .text_main:    mov edx, 22         ; n number of fibonacci numbers displayed    ;;;;;;;;;;;;;;;;    sub esp, 4    mov dword [esp + 0], 0    call _print_number    mov dword [esp + 0], 0Ah    call _print_char    mov dword [esp + 0], 1    call _print_number    mov dword [esp + 0], 0Ah    call _print_char    mov eax, 0          ; variable A    mov ebx, 1          ; variable B    xor ecx, ecx        ; counter    sub edx, 2          ; since we already printed the first twoiterate:    add eax, ebx        ; eax += ebx    mov dword [esp + 0], eax    call _print_number    mov dword [esp + 0], 0Ah    call _print_char    inc ecx    cmp ecx, edx    jg exit             ; exit when we done all iterations    ;;;;;;;;;;;;;;;;    add ebx, eax        ; ebx += eax    mov dword [esp + 0], ebx    call _print_number    mov dword [esp + 0], 0Ah    call _print_char    inc ecx    cmp ecx, edx    jle iterate         ; jump back if we haven't done all iterations yetexit:    mov dword [esp + 0], 0Ah    call _print_char    add esp, 4    ret

:-D