(x86) jumps in assembly

Started by
15 comments, last by Adam_42 10 years, 4 months ago

I wanted to optymize some procedure in assembly

(it was something like minmax(a,b,c) small code

to get smallest one and biggest one of the three ints given)

i think the most important way of optymizing codes like

here is to decrease number of jumps, but I wander

what is most costly - are the inconditional jumps costly?

are the conditional jumps fails costly? If i got a jump

'tree; that jumps between 5 conditional ways each one

is 3 conditionals long is thismore costly than having

4 conditonal ways each one 3 ifs long? how to optymize

jumps in x86/x64 asm? what elements do decrease?

Advertisement
The expensive jumps are the conditional jumps that are hard to predict. The CPU uses some heuristics to try to predict if a conditional jump will be taken or not.

But for this particular piece of code you can do everything with conditional moves and no jumps.

The expensive jumps are the conditional jumps that are hard to predict. The CPU uses some heuristics to try to predict if a conditional jump will be taken or not.

But for this particular piece of code you can do everything with conditional moves and no jumps.

where I could find this code, someone would like to show?

afaik gcc generated such way code here

//input ints eax ebx edx

// output eax: min, ebx:- max

cmp edx, eax

jg L1

swap edx, eax //here three movs for swap (*)

L1:

cmp ebx, edx

jg L2

cmp eax, ebx

cmovg eax, ebx

mov ebx, edx

L2::

sorry if i mistake something in rewriting this code (this is taken down from my notes) it has only two conditional jumps so it is quite optymized I think, can it be optymized? (I am notso good in asm)

(*) mov ecx, edx

mov edx, aex

mov eax, ecx

Here's a C++ implementation that is probably hard to beat:


void compute_min_max(int &min, int &max, int x, int y, int z) {
  min = x < y ? x : y;
  max = x > y ? x : y;
  min = min < z ? min : z;
  max = max > z ? max : z;
}

It produces this assembly code (clang-500.2.76 with -O3):


.globl __Z15compute_min_maxRiS_iii
.align 4, 0x90
__Z15compute_min_maxRiS_iii:            ## @_Z15compute_min_maxRiS_iii
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp4:
.cfi_def_cfa_register %rbp
cmpl %ecx, %edx
movl %ecx, %eax
cmovlel %edx, %eax
movl %eax, (%rdi)
cmovgel %edx, %ecx
movl %ecx, (%rsi)
movl (%rdi), %eax
cmpl %r8d, %eax
cmovgl %r8d, %eax
movl %eax, (%rdi)
movl (%rsi), %eax
cmpl %r8d, %eax
cmovgel %eax, %r8d
movl %r8d, (%rsi)
popq %rbp
ret
.cfi_endproc

As you see, there are no jumps at all.

Also note that C++11 even has a ready-to-use std::minmax, which is likely as good as the compiler is able to do it, for any type.

Also note that C++11 even has a ready-to-use std::minmax, which is likely as good as the compiler is able to do it, for any type.

No, that would make sense but in my experience you can do better than the library function very often, especially if the function is not very popular (like in this case). You just have to test.

On g++ 4.7.1 std::minmax compiles to something that is full of conditional jumps. I tried it in a synthetic test and my code is about 3 times faster. Of course synthetic tests are not very relevant, and you might get different results with different compilers, on different computers or inside different programs.

This is the code I used for the comparison:

void compute_min_max(int &min, int &max, int x, int y, int z) {
  auto min_max = std::minmax({x, y, z});
  min = min_max.first;
  max = min_max.second;
}

Here's what gcc 4.8.1 has to say to this similar code (which doesn't pass non-const refs and uses the default comparator:


#include <algorithm>
int main(int argc, char**)
{
    int a = 5; int b = argc;
    auto min_max = std::minmax(a, b);

    printf("%d%d", min_max.first, min_max.second);
    return 0;
}

(using argc solely so compiler doesn't optimize out the minmax)

Output is not much different from what I'd expect (the complete main function):


mov    $0x5,%eax
movl   $0x409084,(%esp)
mov    %eax,%edx
cmp    $0x5,%ebx
cmovge %ebx,%edx
cmovge %eax,%ebx
mov    %edx,0x8(%esp)
mov    %ebx,0x4(%esp)
call   0x4075b0 <printf(char const*, ...)
xor    %eax,%eax
mov    -0x4(%ebp),%ebx
leave
ret

But the thing is, this code works with every type, not just int. Including types that need some weird custom comparator (like in your sample). Even if it's 2 cycles slower in one or the other case, I'd prefer it for its simplicity and universality (biggest advantage if you ask me).

samoth, can you try with three arguments, which is what this thread is about?

Oh wow, I've completely overlooked the "little" detail that it's 3 values, not 2. blink.png

Yes, the code isn't quite so nice with 3 or more elements (although it certainly works correctly, of course). Stepping through it, it appears that it's iterating the initializer_list which isn't well-optimized. The actual min/max is still a mov+cmovge. Surprising, though. I would have expected an initializer_list to be pretty much trivially optimizable (seeing as its size is known).

The output is fine again if you use two binary minmax operations followed by a min and max, but then it's equivalent to the 4 ternary operators in your first snippet, so one could really just as well use that one. minmax doesn't make the intent any more clear or the code more concise, in that case sad.png

On g++ 4.7.1 std::minmax compiles to something that is full of conditional jumps. I tried it in a synthetic test and my code is about 3 times faster. Of course synthetic tests are not very relevant, and you might get different results with different compilers, on different computers or inside different programs.

is is the code I used for the comparison:

would it be easy to test the asm kode I give against it (I mean code i gave with 2 jumps against the one witout jumps)?

I would be curious which one is faster..

what is input and output registers in the asm provedure you show (sadly I am not experienced in this calling convention)

This topic is closed to new replies.

Advertisement