Archived

This topic is now archived and is closed to further replies.

Help with assembly

This topic is 4944 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

(Wasn't getting help in the Beginner's forum) OK I made a simple function using VC++ 6.0's inline assembler and well I don't understand how the code could possibly go faster than what I'm looking at right now. It appears to be twice as slow as x = i + i (I'm using the function call Add(i, i) 1,000,000 times:
inline int ADD(int x, int y)
{
     __asm
     {
          mov eax, x
          add eax, y
          mov x, eax
     }
     return x;
}
Is there a way to streamline this? I'm literally an hour old to assembly. [edited by - uber_n00b on May 31, 2004 4:21:16 PM]

Share this post


Link to post
Share on other sites
In cases like this the compiler has an advantage over you. It needs to treat your code as a black box, so it adds some wrapper code around it. That doesn''t happen with compiler generated code.
Assambly doesn''t have any advantage with this code, or most code for that matter. Although people''s opionions differ on this, almost always the compiler will generate better code then you do. The people who write compilers know almost everything there is to know about every instruction, the pipeline, etc. etc. Do you really think you can do it better?

Share this post


Link to post
Share on other sites
1) I agree with twanvl, for simple functions like that, there isn''t really any advantage of using assembly language; in fact can it prevent the compiler from optimising code around the function because it can''t easily predict what impact the assembly code will have.


2) As you''ll no doubt learn, most of the time you shouldn''t need to use assembly language unless a profiler (VTune, TrueTime etc) has shown the function to be taking a major amount of time in your application. BTW, you should never compare performance profiles of C/C++ debug builds to assembly code either since debug C/C++ builds don''t have any optimisations enabled.


3) __declspec(naked) and __forceinline may be of interest to you if you really want to do that code as assembly language.


Simon O''Connor
Game Programmer &
Microsoft DirectX MVP

Share this post


Link to post
Share on other sites
I''m going to pop in here and just say a few things about this. First thing is your ASM code is making the assumption that we work in a single (pipline) world. I am going to Quote Roby Joehanes, "Today''s computers have more than one pipelines. This is refered as multi-pipeline processor. The old Pentium has 2 pipelines, so, it''s like having two separate processors (but of course not equal). If we have two pipelines, the processor can execute two instructions in parallel. If each pipelines has 5 stages, we''re effectively pump up the performance up to 10 times. Running two or more instructions in parallel needs a precaution: These instructions must be independent to each other in order to be able to be executed in parallel. For example:

mov bx, ax
mov cx, bx
This instructions cannot be run in parallel. Why? Because the second instruction needs the outcome of the first instruction, i.e. the value of BX is determined by the result of the first instruction. Look at the next example:

mov bx, ax
mov cx, ax
This program can be run in parallel because now both of them only depends on AX (which is assumed already set way ago). We know that both excerpts mean the same thing. But the second example is faster because they can run in parallel. Therefore, the instruction "ordering" can make difference because of multi-pipelining.

Therefore, if you want to speed up your code, sometimes it''s worth to reorder instructions so that many of them can be run in parallel. " I think that says it as best as i have ever seen. oh and yeah the mov instruction is somehow superfluous, i dont remember exactly where but I remember seeing somthing about what you are doing and the writer said that you should never use mov, because there are other instructions that can move and Add or multiply ect without that extra instruction.

Share this post


Link to post
Share on other sites
For illustrative purposes, I compiled this with Visual C++ 6.0:
#include <stdio.h>

__inline int ADD(int x, int y);

int main()
{

int i = 5;
i = ADD(i,i);
printf("i is: %d\n", i);

}
__inline int ADD(int x, int y)
{
__asm
{
mov eax, x
add eax, y
}
}

I then disassembled the executable and extracted these pertinent lines of assembly:

:00401000 55 push ebp ; save the frame pointer
:00401001 8BEC mov ebp, esp ; create new frame pointer
; using the stack pointer
:00401003 51 push ecx ; wtf???
; if anyone knows why ECX is
; pushed, please let me know
:00401004 C745FC05000000 mov [ebp-04], 00000005 ; i = 5

/* We''re in the inline ADD() now */

:0040100B 8B45FC mov eax, dword ptr [ebp-04] ; move i into
;the EAX register
:0040100E 0345FC add eax, dword ptr [ebp-04] ; add i (5) to
;the EAX register

/* We''re out of the inline ADD() */

:00401011 50 push eax ; eax holds the current value of i
; normally, the code should now do something like:
; mov [ebp-04], eax
; C functions return values in the EAX register
; i = ADD(i,i) should result in ADD doubling i and
; storing the result in EAX (and it does)
; the "i = " will then cause the EAX value to be
; copied into i, however,
; the compiler has noticed that i is used in the
; following printf statement and never again, so
; the current EAX value is pushed (in order to pass
; it to printf) and never copied into
; the memory location representing i

* Possible StringData Ref from Data Obj ->"i is: %d"

:00401012 6830604000 push 00406030 ; push pointer to "i is %d"
:00401017 E814000000 call 00401030 ; call printf
:0040101C 83C408 add esp, 00000008 ; take previously
; mentioned pointer and
; i value off the stack
:0040101F 8BE5 mov esp, ebp ; point stack pointer back
; at the beginning of the
; local frame
:00401021 5D pop ebp ; restore the old frame pointer
:00401022 C3 ret ; we''re done

In C/C++, arguments to a function are pushed onto the stack in reverse order. On Intel architectures, the stack grows downward (from higher to lower addresses) so "sub esp, 8) moves the stack pointer down 8 bytes. The return value of a function is placed in EAX. I hope this helps you to understand what is going on in your code a little better.

-steven

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
I suspect the reason they are pushing ecx is because its one of the "protected" registers within the framework of the compiler - that is - the compiler assumes that no called function will ever alter the value of ecx, allowing it to use the register for a handy place to put an often referenced integer variable.

This fits in nicely with the windows API which protects ecx, esi, edi, ebp, and esp while all other registers (eax, ebx, and edx) can be modified arbitrarily during api calls.

Share this post


Link to post
Share on other sites
I'm almost positive that ecx is also used as the "this" pointer between c++ class method calls. (Using the impled "thiscall" argument passing method - except for non class functions and satic members)

So possibly the compiler would have to save that if you are inside of a member method and you use inline assembly - to prevent the this pointer from turing to garbage.

correct me if I'm wrong though.

[edited by - pjcast on June 1, 2004 12:59:17 AM]

Share this post


Link to post
Share on other sites
quote:

Sigh that is how I figured it was: VC++ adding extra code in. I noticed this since a blank function call took the exact same amount of time more.



You don''t need to "figure". You can check for yourself.

You can turn on "generate listing file" and find the assembly generated for a specific function by searching in that file.

Or you can put a breakpoint in a function in the debugger, and when it''s hit, switch to assembly view, or mixed view, and see what the compiler generated.

Share this post


Link to post
Share on other sites