# (x86) jumps in assembly

This topic is 1867 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I wanted to optymize some procedure in assembly

(it was something like minmax(a,b,c) small code

to get smallest one and biggest one of the three ints given)

i think the most important way of optymizing codes like

here is to decrease number of jumps, but I wander

what is most costly - are the inconditional jumps costly?

are the conditional jumps fails costly? If i got a jump

'tree; that jumps between 5 conditional ways each one

is 3 conditionals long is thismore costly than having

4 conditonal ways each one 3 ifs long? how to optymize

jumps in x86/x64 asm? what elements do decrease?

##### Share on other sites
The expensive jumps are the conditional jumps that are hard to predict. The CPU uses some heuristics to try to predict if a conditional jump will be taken or not.

But for this particular piece of code you can do everything with conditional moves and no jumps.

##### Share on other sites

The expensive jumps are the conditional jumps that are hard to predict. The CPU uses some heuristics to try to predict if a conditional jump will be taken or not.

But for this particular piece of code you can do everything with conditional moves and no jumps.

where I could find this code, someone would like to show?

afaik gcc generated such way code here

//input ints eax ebx edx

// output eax:  min, ebx:- max

cmp edx, eax

jg L1

swap edx, eax   //here three movs for swap (*)

L1:

cmp ebx, edx

jg  L2

cmp eax, ebx

cmovg eax, ebx

mov ebx, edx

L2::

sorry if i mistake something in rewriting this code (this is taken down from my notes) it has only two conditional jumps so it is quite optymized I think, can it be optymized? (I am notso good in asm)

(*)  mov ecx, edx

mov edx, aex

mov eax, ecx

Edited by fir

##### Share on other sites

Also note that C++11 even has a ready-to-use std::minmax, which is likely as good as the compiler is able to do it, for any type.

##### Share on other sites

Here's what gcc 4.8.1 has to say to this similar code (which doesn't pass non-const refs and uses the default comparator:

#include <algorithm>
int main(int argc, char**)
{
int a = 5; int b = argc;
auto min_max = std::minmax(a, b);

printf("%d%d", min_max.first, min_max.second);
return 0;
}

(using argc solely so compiler doesn't optimize out the minmax)

Output is not much different from what I'd expect (the complete main function):

mov    $0x5,%eax movl$0x409084,(%esp)
mov    %eax,%edx
cmp    \$0x5,%ebx
cmovge %ebx,%edx
cmovge %eax,%ebx
mov    %edx,0x8(%esp)
mov    %ebx,0x4(%esp)
call   0x4075b0 <printf(char const*, ...)
xor    %eax,%eax
mov    -0x4(%ebp),%ebx
leave
ret

But the thing is, this code works with every type, not just int. Including types that need some weird custom comparator (like in your sample). Even if it's 2 cycles slower in one or the other case, I'd prefer it for its simplicity and universality (biggest advantage if you ask me).

Edited by samoth

##### Share on other sites

samoth, can you try with three arguments, which is what this thread is about?

##### Share on other sites

Oh wow, I've completely overlooked the "little" detail that it's 3 values, not 2.

Yes, the code isn't quite so nice with 3 or more elements (although it certainly works correctly, of course). Stepping through it, it appears that it's iterating the initializer_list which isn't well-optimized. The actual min/max is still a mov+cmovge. Surprising, though. I would have expected an initializer_list to be pretty much trivially optimizable (seeing as its size is known).

The output is fine again if you use two binary minmax operations followed by a min and max, but then it's equivalent to the 4 ternary operators in your first snippet, so one could really just as well use that one. minmax doesn't make the intent any more clear or the code more concise, in that case

Edited by samoth

##### Share on other sites

On g++ 4.7.1 std::minmax compiles to something that is full of conditional jumps. I tried it in a synthetic test and my code is about 3 times faster. Of course synthetic tests are not very relevant, and you might get different results with different compilers, on different computers or inside different programs.

is is the code I used for the comparison:

would it be easy to test the asm kode I give against it (I mean code i gave with 2 jumps against the one witout jumps)?

I would be curious which one is faster..

what is  input and output registers in the asm provedure you show (sadly I am not experienced in this calling convention)

Edited by fir

• ### What is your GameDev Story?

In 2019 we are celebrating 20 years of GameDev.net! Share your GameDev Story with us.

• 10
• 11
• 13
• 9
• 11
• ### Forum Statistics

• Total Topics
634091
• Total Posts
3015433
×