Sign in to follow this  

[ASM] Number printing algorithm

This topic is 3101 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

So, been tinkering with ASM a bit, and tried writing an ASM procedure that prints out readable numbers (especially ones bigger or equal to 10). It works :), I'm happy with that, but I still want to post it here and read what you guys think of it. This is the first program I wrote while not being guided by some example. I tried following the _cdecl convention. The only think I didn't grasp is why they decrease the stack pointer (esp) by 4 * number_of_local_variables before going into the actual procedure...in this code the number of local variables is unknown...I just decrease the stack pointer whenever I figure I need a new variable. Is that wrong? Also, the base pointer (ebp) is remembered, but wouldn't it also be possible not using the ebp at all? Just relying on the esp (where you very much watch for pushes or pops on the way) would do just the same trick, doesn't it? Anyways, here's the code, thanks for any comments :-)
global _main
extern _putchar

section .text

_main:
    push 789
    call _print_number  ; void print_number(unsigned int i);
    add esp, 4

    ret

_print_number:          ; print out readable numbers
    push ebp
    mov  ebp, esp

    push eax            ; save registers
    push ebx
    push ecx
    push edx

    mov eax, [ebp + 8]  ; take argument
    mov ebx, 10         ; divider is 10
    xor ecx, ecx        ; counter is 0

    cmp eax, 10
    jge print_number_push_remainder
    jmp print_number_put_number

print_number_push_remainder:
    xor edx, edx        ; high significant bytes are 0, low significant bytes are in eax
    div ebx             ; divide by 10

    push edx            ; push remainder
    inc ecx

    cmp eax, 10         ; keep going until quotient (eax) is smaller then 10
    jge print_number_push_remainder

print_number_put_number:
    add eax, 48         ; '0' is at place 48 in ASCII

    push ecx            ; remember counter
    push eax            ; print out quotient (or argument if is lower then 10), highest significant number of argument
	call _putchar
    add esp, 4
    pop ecx

    cmp ecx, 0
    je print_number_exit

print_number_put_number_loop:
    pop eax             ; pop remainder in reverse order
    dec ecx

    add eax, 48         ; '0' is at place 48 in ASCII

    push ecx            ; remember counter
    push eax            ; print out remainder, lower significant number then the previous print
    call _putchar
    add esp, 4
    pop ecx

    cmp ecx, 0          ; keep printing until we had all pushed remainders
    jne print_number_put_number_loop

print_number_exit:
    pop edx             ; restore registers
    pop ecx
    pop ebx
    pop eax

    leave
    ret

Share this post


Link to post
Share on other sites
Your EBP question is a good one. Yes, compilers know how to just rely on the ESP. In MSVC the option for this is called "Omit Base Pointer". I'm going to use MSVC6 as the example, since each compiler can do things slightly differently if they want.

MSVC does this automatically by tracing its control flow graph. Each node (called a 'basic block') in the graph is a sequence of instructions which start with a label (function start or jump destination) and do not contain branches until the last instruction. Since the instructions inside a node must always be executed in order, you can compute the stack offset difference from the start to the end of the node, and it will always be the same.

The compiler generates code so that no matter how the control flow graph is executed, the stack offset will always be the same value when entering a particular node, AND that the stack offset is back at zero at the exit of any 'return' block.

To simplify this job, the compiler puts ALL local variables in one clump at the beginning (sub esp, ###), then only moves the stack for function call arguments from that point on. Usually the compiler will not use the stack for anything else (like your digits in your number-printer).

The compiler can also use 'lazy' stack management. If it calls two functions in a row, and is using a calling convention where the caller pops arguments, it will wait until the last possible moment before popping everything (with a single 'add esp, ###' instruction). The 'last possible moment' in this case is at the end of a basic block which leads into a successor block with a smaller stack offset.


For your particluar algorithm, you can find the upper limit to the number of digits you print out. The maximum value of a 32-bit number is 4294967295. That's a maximum of 10 digits of storage you would need to set aside (you can print the negative sign first without wasting a local variable on it). You could just allocate a fixed number of local variables on the stack for that many digits and you wouldn't need to push anything extra on the stack during your algorithm.

Share this post


Link to post
Share on other sites
Thanks, useful response :-)

Quote:
Original post by Nypyren
The compiler can also use 'lazy' stack management. If it calls two functions in a row, and is using a calling convention where the caller pops arguments, it will wait until the last possible moment before popping everything (with a single 'add esp, ###' instruction). The 'last possible moment' in this case is at the end of a basic block which leads into a successor block with a smaller stack offset.


Yes, exactly, that's what I was tinkering with too ;-), I noticed how you can keep parameters pushed on the stack when using my function for example, something like:

push 1234
call _print_number

push 5678
call _print_number

add esp, 8


Quote:
Original post by Nypyren
For your particluar algorithm, you can find the upper limit to the number of digits you print out. The maximum value of a 32-bit number is 4294967295. That's a maximum of 10 digits of storage you would need to set aside (you can print the negative sign first without wasting a local variable on it). You could just allocate a fixed number of local variables on the stack for that many digits and you wouldn't need to push anything extra on the stack during your algorithm.


Nice one, is the performance profit here that we subtract from the esp only once, and not per number? In which case I can also instead of popping, just moving the data and at the end add 10 * 4 to the esp.

Also, I could also not add to the esp, since 'leave' includes 'mov esp, ebp', so whatever I do with the esp, I can restore it with one mov.

Beside that, is adding or subtracting to and from the esp only simple arithmetics, or does subtracting from the esp also have effects like 'reserving
space' (whatever that may include, perhaps looking for a new segment since the current one is full?).

Share this post


Link to post
Share on other sites
Quote:
Original post by Decrius
Nice one, is the performance profit here that we subtract from the esp only once, and not per number? In which case I can also instead of popping, just moving the data and at the end add 10 * 4 to the esp.


Right, it's just an optimization for # of instructions. Just make sure your stack offset is consistent at each of your labels! :)


Quote:

Also, I could also not add to the esp, since 'leave' includes 'mov esp, ebp', so whatever I do with the esp, I can restore it with one mov.


This is true in your case. However, when the MSVC compiler uses the 'omit base pointer' option, this allows it to use EBP as a general-purpose register. This lets it leave more data in registers longer instead of accessing memory. Registers are WAY faster than memory access.

Since the EBP isn't saved in this case, the compiler has to manually put ESP "back where it found it" before it returns. Whenever EBP is used as a general purpose register, the compiler can't use the ENTER or LEAVE instructions.

Also, some fairly simplistic debuggers and call-stack viewers try to show you the call stack by walking down the stack using EBP pointers (since each EBP pointer typically points to the next one down the stack if it's used to save stack frames). Omitting EBP can screw up these debuggers, but most good debuggers don't care about EBP and can brute-force the stack looking for pointers that fall within the range of addresses where code lives.


Quote:

Beside that, is adding or subtracting to and from the esp only simple arithmetics, or does subtracting from the esp also have effects like 'reserving
space' (whatever that may include, perhaps looking for a new segment since the current one is full?).


All the processor does is change the register's value and tweak some flags (like ZF). The idea of 'reserving space' is something that the programmers agree on. Theoretically you could make a language that doesn't use a stack and use the ESP register for something totally different.

As far as stack overflows go, the x86 lets you mark different chunks (Pages) of memory with different 'protection' (Read, Write, Execute, etc). There is an additional type called a 'Guard' which does something special: Any time code tries to read or write that memory, it raises a special hardware exception. Operating systems (at least Windows NT kernels) use this feature to detect overflows, assuming that any access to that memory was caused by the stack growing too large.

Share this post


Link to post
Share on other sites
Thanks Nypyren ;-)

I just finished Fibonacci-numbers with it:

extern _putchar
global _print_char
global _print_number

section .text

_print_char: ; void print_char(unsigned char);
push ebp
mov ebp, esp

sub esp, 16 ; 1 local variable + 3 registers = 16

mov [esp + 12], eax ; save registers
mov [esp + 8], ecx
mov [esp + 4], edx

mov eax, [ebp + 8]
mov [esp + 0], eax
call _putchar

mov edx, [esp + 4] ; restore registers
mov ecx, [esp + 8]
mov eax, [esp + 12]

leave
ret

_print_number: ; void print_number(unsigned int);
push ebp
mov ebp, esp

sub esp, 56 ; 10 local variables (max. 10 positions for a 32-bit number) + 4 registers = 56

mov [esp + 52], eax ; save registers
mov [esp + 48], ebx
mov [esp + 44], ecx
mov [esp + 40], edx

mov eax, [ebp + 8] ; take argument
mov ebx, 10 ; divider is 10
xor ecx, ecx ; counter is 0

cmp eax, 10
jge print_number_push_remainder
jmp print_number_put_number

print_number_push_remainder:
xor edx, edx ; high significant bytes are 0, low significant bytes are in eax
div ebx ; divide by 10

mov [esp + (4 * ecx)], edx ; store remainder
inc ecx

cmp eax, 10 ; keep going until quotient (eax) is smaller then 10
jge print_number_push_remainder

print_number_put_number:
add eax, 48 ; '0' is at place 48 in ASCII

push ecx ; remember counter
push eax ; print out quotient (or argument if is lower then 10), highest significant number of argument
call _putchar
add esp, 4
pop ecx

cmp ecx, 0
je print_number_exit

print_number_put_number_loop:
dec ecx
mov eax, [esp + (4 * ecx)] ; pop remainder in reverse order

add eax, 48 ; '0' is at place 48 in ASCII

push ecx ; remember counter
push eax ; print out remainder, lower significant number then the previous print
call _putchar
add esp, 4
pop ecx

cmp ecx, 0 ; keep printing until we had all pushed remainders
jne print_number_put_number_loop

print_number_exit:
mov edx, [esp + 40] ; restore registers
mov ecx, [esp + 44]
mov ebx, [esp + 48]
mov eax, [esp + 52]

leave
ret


global _main
extern _print_char
extern _print_number

section .text

_main:
mov edx, 22 ; n number of fibonacci numbers displayed

;;;;;;;;;;;;;;;;

sub esp, 4

mov dword [esp + 0], 0
call _print_number

mov dword [esp + 0], 0Ah
call _print_char

mov dword [esp + 0], 1
call _print_number

mov dword [esp + 0], 0Ah
call _print_char

mov eax, 0 ; variable A
mov ebx, 1 ; variable B
xor ecx, ecx ; counter
sub edx, 2 ; since we already printed the first two

iterate:
add eax, ebx ; eax += ebx

mov dword [esp + 0], eax
call _print_number

mov dword [esp + 0], 0Ah
call _print_char

inc ecx
cmp ecx, edx
jg exit ; exit when we done all iterations

;;;;;;;;;;;;;;;;

add ebx, eax ; ebx += eax

mov dword [esp + 0], ebx
call _print_number

mov dword [esp + 0], 0Ah
call _print_char

inc ecx
cmp ecx, edx
jle iterate ; jump back if we haven't done all iterations yet

exit:
mov dword [esp + 0], 0Ah
call _print_char

add esp, 4

ret


:-D

Share this post


Link to post
Share on other sites

This topic is 3101 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this