Teach me assembly

Started by
12 comments, last by Flambergeman 16 years, 11 months ago
Since now I fixed my problem to print things on screen and begining to understand assembly in general, I'll post my next (all) doubts in this post. Things will be more centralized. I will use here GNU assembler(GAS) and C++. If any problem with this, tell me moderator! OK let's go: I'm in doubt with this function that I writed just ago:

int System::out::print( int p ){
	asm(
//	"sub %esp, 4\n\t"
	"mov %eax, 4\n\t"
	"mov %ebx, 1\n\t"
	"lea %ecx, [%ebp+8]\n\t"
	"mov %edx, 1\n\t"
	"int 0x80\n\t"
//	"add %esp, 4"
	   );
	return 0;
}

This function print a int on screen. I need to fix two things: (1) The lines commented above, are really necessary? The function works equal with and without the these lines. I need more explanations about this stack pointer and his relationship with memory allocations. (2) How to get the size of a number. Sure, here the number is allways 4 bytes long, but if I want to write a second function that receives a class type, I do not know how to calculate the size of this object. Thanks in advance!!
Advertisement
Did you check out PC Assembly Language as Icefox suggested in the other thread?

One trick to learning assembly language is to write up some simple routines in C or C++ and then have the compiler generate the assembler for you.

"sub %esp, 4\n\t" - this makes space on the stack for what in C would be called a local variable

"add %esp, 4" - this cleans it up

I think you've got the operand order reversed. And also, when you use an immediate value in GAS, you need to prefix it appropriately. Iirc, with a $ sign.

"subl $4,%esp\n"
"addl $4,%esp\n"

And in GAS instructions are sometimes suffixed to indicate the operand size - b for byte, w for word and l for long (or double word). Here's a reference than might help: AT&T Syntax versus Intel Syntax

You always need to keep the stack balanced between function calls. For every push you must pop, for every sub, a corresponding add. This ensures the function is able to return to the proper address.

To get the size of a structure or a class you have to determine it yourself as you write the code.


"I thought what I'd do was, I'd pretend I was one of those deaf-mutes." - the Laughing Man
Popping after a function is only required if the target function is cdecl. If it's stdcall, you don't have to pop. COM functions and most Win32 library functions are stdcall. LIBC functions are usually cdecl. varargs functions are ALWAYS cdecl (as far as I've ever seen).

A helpful pattern to speed up popping after function calls:

(I'm using VC inline assembly syntax here which is nearly identical to Intel syntax)

push d
push c
push b
push a
call Fn1
add esp, 0x10 // change this depending on your processor


Compilers will further optimize by delaying those "add esp, 0x10" such that they are placed immediately before "joins" in the control flow graph (and where the different predecessors have varying stack offsets), but if you're handwriting assembly I doubt you'll want to bother keeping track of where those should go.

(Function 1 : one argument: "Param0")(Block 1 : Stack Offset 0)push esipush edixor eax, eaxmov ax, [esp+0xC]  (Param0 = Frame+4 - Stack Offset -8 = +0xC)text ax, axjne L1(Branch Exit : Stack Offset -8, successors: Blocks 2 and 5)(Block 2 : Stack Offset -8)push eaxcall Fn2(Block 3 : Stack Offset -0xC)push eax // ax returned by first function, otherwise wouldn't be necessarycall Fn3(Block 4 : Stack Offset -0x10)(Stack Offset Constraint for Successor == -8, adjusting stack)add esp, 8(Block 5 : Stack Offset -8)L1:pop edipop esiret(Function Exit: Stack Offset 0)


Kind of a pain to keep track of when it gets more complicated.

Some macro assemblers let you define structure data types and should have some kind of "sizeof" equivalent. There are no actual instructions that deal with structure data types like this - the macro assembler or compiler determines the constant size of the struct at compile/assemble time and just sticks the number in the output code.

[Edited by - Nypyren on April 18, 2007 11:48:21 PM]
Quote:Original post by Nypyren
Popping after a function is only required if the target function is cdecl. If it's stdcall, you don't have to pop. COM functions and most Win32 library functions are stdcall. LIBC functions are usually cdecl. varargs functions are ALWAYS cdecl (as far as I've ever seen).


Why would the calling convention matter? Taking up stack space is taking up stack space, and you only have X amount.

Quote:A helpful pattern to speed up popping after function calls:

(I'm using VC inline assembly syntax here which is nearly identical to Intel syntax)

push d
push c
push b
push a
call Fn1
add esp, 0x10 // change this depending on your processor


It would be helpful to explain where the 0x10 comes from and how it would need to change ;) Also, this assumes caller-cleanup of the stack, which AFAIK isn't exactly universal.

[Edited by - Zahlman on April 19, 2007 9:54:32 PM]
Another thing that should be mentioned is stack alignment. Consider a stack alignment of 16. Now a function foo() calls a function bar(), so the return address is pushed onto the stack. The result is, that in bar() the stack is not aligned anymore, so bar() has to adjust the stack pointer.
Quote:Original post by LessBread
Did you check out PC Assembly Language as Icefox suggested in the other thread?

I readed now more carefully and I think that I understand now. you push values to stack to pass parameters to variables, and use sub esp to ma room for local variables. Ok I just need to practice now :)

Quote:Original post by LessBread
One trick to learning assembly language is to write up some simple routines in C or C++ and then have the compiler generate the assembler for you.

Yeah, I'm doing this.

Quote:Original post by LessBread
"sub %esp, 4\n\t" - this makes space on the stack for what in C would be called a local variable

"add %esp, 4" - this cleans it up

I think you've got the operand order reversed. And also, when you use an immediate value in GAS, you need to prefix it appropriately. Iirc, with a $ sign.

"subl $4,%esp\n"
"addl $4,%esp\n"

And in GAS instructions are sometimes suffixed to indicate the operand size - b for byte, w for word and l for long (or double word). Here's a reference than might help: AT&T Syntax versus Intel Syntax

I'm using intel syntax.

Quote:Original post by LessBread
You always need to keep the stack balanced between function calls. For every push you must pop, for every sub, a corresponding add. This ensures the function is able to return to the proper address.

I already know. Is just dificult to read the code and notice that a pop, or add in esp is missing.

Quote:Original post by LessBread
To get the size of a structure or a class you have to determine it yourself as you write the code.

I'll read more info on arrays tomorow.


Nypyren
I'll use allways stdcall convention for my functions. It's faster, and I'll not need printf() like parameters. The number of parameters will be fixed allways.

And thinking how to pass parameters to fnctions, I decided that I will use allways the push op, because my functions can have more than 4 parameters, so using registers to load some of them is not a thing that solves the wole problem. So I'll not use any register to pass parameters. I'll use the stack only. Do you agree?
I want things that works really fast here.

nmi
I have been noticed :)
Quote:Original post by Zahlman
Why would the calling convention matter?

Quote:Also, this assumes caller-cleanup of the stack, which AFAIK isn't exactly universal.


You answered your own question. Though, the stack stuff in the OP's code doesn't really have anything to do with calling conventions, so that's probably why you said the first part. Only the second part is where calling convention matters.




To OP, I don't think that function is doing what you want. What it does is print one byte at the address of the value of the integer p. If you wanted to print an integer, you would first have to convert it to a string, then print the string.

The parameter that goes into ecx for the system call is the starting address of a string of bytes that will be interpreted as characters. It will not print the number in ecx.

[Edited by - nicksterdomus on April 19, 2007 11:25:54 PM]
Quote:Original post by Flambergeman
I'll use allways stdcall convention for my functions. It's faster, and I'll not need printf() like parameters. The number of parameters will be fixed allways.


That will definitely make life easier! :) Just make sure if you use other people's code that you know whether it's cdecl or stdcall.
Also, I believe lea is the wrong instruction to use in the line
"lea %ecx, [%ebp+8]\n\t"

If I am converting this instruction to ATT syntax correctly, then this instruction would add the contents of ebp and 8, then put that value into ecx. What you want to do is add the contents of ebp and 8, which forms an address, then load the value stored at that address into ecx. You can use the mov instruction to do that.
"mov %ecx, [%ebp+8]"
Quote:Original post by nicksterdomus
To OP, I don't think that function is doing what you want. What it does is print one byte at the address of the value of the integer p. If you wanted to print an integer, you would first have to convert it to a string, then print the string.

I know. I'm using number 65 as parameter, so console prints a 'A'. This int is just for test for now. I will change the parameters of methods out::print()'s, to accept classes later.
Quote:Original post by nicksterdomus
The parameter that goes into ecx for the system call is the starting address of a string of bytes that will be interpreted as characters. It will not print the number in ecx.

I know too. This is because I use 'lea' and not 'mov'.
Quote:Original post by nicksterdomus
Also, I believe lea is the wrong instruction to use in the line

"lea %ecx, [%ebp+8]\n\t"

No, this correct is lea. mov doesn't work. I tried at my previous post.

This topic is closed to new replies.

Advertisement