Advertisement Jump to content
Sign in to follow this  
fir

(x86) jumps in assembly

This topic is 1867 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I wanted to optymize some procedure in assembly 

(it was something like minmax(a,b,c) small code 

to get smallest one and biggest one of the three ints given)

 

i think the most important way of optymizing codes like

here is to decrease number of jumps, but I wander 

what is most costly - are the inconditional jumps costly?

are the conditional jumps fails costly? If i got a jump 

'tree; that jumps between 5 conditional ways each one 

is 3 conditionals long is thismore costly than having

4 conditonal ways each one 3 ifs long? how to optymize

jumps in x86/x64 asm? what elements do decrease?

 

 

 

 

 

Share this post


Link to post
Share on other sites
Advertisement
The expensive jumps are the conditional jumps that are hard to predict. The CPU uses some heuristics to try to predict if a conditional jump will be taken or not.

But for this particular piece of code you can do everything with conditional moves and no jumps.

Share this post


Link to post
Share on other sites

The expensive jumps are the conditional jumps that are hard to predict. The CPU uses some heuristics to try to predict if a conditional jump will be taken or not.

But for this particular piece of code you can do everything with conditional moves and no jumps.

 

where I could find this code, someone would like to show?

afaik gcc generated such way code here

 

//input ints eax ebx edx

// output eax:  min, ebx:- max

 

cmp edx, eax

jg L1

 swap edx, eax   //here three movs for swap (*)

L1:

  cmp ebx, edx

  jg  L2

  cmp eax, ebx

  cmovg eax, ebx      

  mov ebx, edx

L2::

 

sorry if i mistake something in rewriting this code (this is taken down from my notes) it has only two conditional jumps so it is quite optymized I think, can it be optymized? (I am notso good in asm)

 

(*)  mov ecx, edx        

  mov edx, aex 

  mov eax, ecx

Edited by fir

Share this post


Link to post
Share on other sites

Here's what gcc 4.8.1 has to say to this similar code (which doesn't pass non-const refs and uses the default comparator:

#include <algorithm>
int main(int argc, char**)
{
    int a = 5; int b = argc;
    auto min_max = std::minmax(a, b);

    printf("%d%d", min_max.first, min_max.second);
    return 0;
}

(using argc solely so compiler doesn't optimize out the minmax)

 

Output is not much different from what I'd expect (the complete main function):

mov    $0x5,%eax
movl   $0x409084,(%esp)
mov    %eax,%edx
cmp    $0x5,%ebx
cmovge %ebx,%edx
cmovge %eax,%ebx
mov    %edx,0x8(%esp)
mov    %ebx,0x4(%esp)
call   0x4075b0 <printf(char const*, ...)
xor    %eax,%eax
mov    -0x4(%ebp),%ebx
leave
ret

But the thing is, this code works with every type, not just int. Including types that need some weird custom comparator (like in your sample). Even if it's 2 cycles slower in one or the other case, I'd prefer it for its simplicity and universality (biggest advantage if you ask me).

Edited by samoth

Share this post


Link to post
Share on other sites

Oh wow, I've completely overlooked the "little" detail that it's 3 values, not 2. blink.png

 

Yes, the code isn't quite so nice with 3 or more elements (although it certainly works correctly, of course). Stepping through it, it appears that it's iterating the initializer_list which isn't well-optimized. The actual min/max is still a mov+cmovge. Surprising, though. I would have expected an initializer_list to be pretty much trivially optimizable (seeing as its size is known).

 

The output is fine again if you use two binary minmax operations followed by a min and max, but then it's equivalent to the 4 ternary operators in your first snippet, so one could really just as well use that one. minmax doesn't make the intent any more clear or the code more concise, in that case sad.png

Edited by samoth

Share this post


Link to post
Share on other sites

On g++ 4.7.1 std::minmax compiles to something that is full of conditional jumps. I tried it in a synthetic test and my code is about 3 times faster. Of course synthetic tests are not very relevant, and you might get different results with different compilers, on different computers or inside different programs.

is is the code I used for the comparison:

 

would it be easy to test the asm kode I give against it (I mean code i gave with 2 jumps against the one witout jumps)?

I would be curious which one is faster..

 

what is  input and output registers in the asm provedure you show (sadly I am not experienced in this calling convention)

Edited by fir

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!