Branchless code

Started by
13 comments, last by Russell 21 years, 5 months ago
Branchless code has the potential to run faster than the equivalent code that uses conditionals, or so I''ve heard. I was wondering about how people go about getting rid of conditional code, since it would seem to make a significant difference in some cases (such as frequently used code). One idea that came to mind is to use an array of function pointers instead of a conditional. For example, instead of something like (in my chess program): if (side_to_move == WHITE) { // do white stuff here } else { // do black stuff here } You could do something like: // Forgive me if my syntax for using function pointers is wrong typedef ... Function; Function do_stuff[] = { DoWhiteStuff, DoBlackStuff }; do_stuff[side_to_move](); That would eliminate the potential branch misprediction penalty, and only cost a call and ret instruction. So as long as the call+ret instructions don''t cost more than 10-20 cycles that you would suffer for a branch misprediction, you win here. In thinking about this, it sounded similar to the OOP refactoring technique known as "replace conditional with polymorphism". Is that essentially what is happening when I replace a conditional with a lookup table of function pointers? Anyway, I was wondering if someone more knowledgable than myself about these things could shed some light on which method would be more efficient, and about the real worth of branchless code. Is it worth a great amount of extra work to achieve branchless code? Or is it only going to provide minimal speedups? Thanks, Russell
Advertisement
I''m not really knowledgeable about this, but I''ll tak an educated guess, and if someone knows more than I do, I''d love to hear it.

For the example situation where there are only two cases, I would suspect the if statement is faster. With an array of function pointers, there will be an array reference, which I would guess, takes at least as long as a conditional in the if. Regardless, in the situation with only two cases, it depends on what is more readable. I personally find the if statement easier to read.

However, if you had something like

switch(side_to_move)
{
case white:
...
case red:
...
case blue:
...
}

where you have a lot of case statements, then I would go with the function pointers, because in that case, they will be faster, especially if the switch statement is rather large. 2 or 3 I wouldn''t worry about, 20 or 30, I''d worry. That''s an extreme though.

My opinion, if you only have one conditional, don''t bother changing your code, it''s probably not worth it. If you have a lot of conditionals that are called frequently, then you probably would change to function pointers if speed is an issue.

________________________
Grab your sword, and get prepared for the SolidSteel RPG engine, coming soon...(i.e. when it''s done)
______________________________"Man is born free, and everywhere he is in chains" - J.J. Rousseau
The time saved by not using the if...else is not worth or justify the obscurity of the function pointer solution.


ECKILLER
ECKILLER
Only optimize places where you need increased efficiency. The example the original poster gave works in theory, aside from a few syntax issues. The trade-off is a taking a possible branch-prediction failure and turning it into an indirect jump. Determing whether or not it''s worth it is the difficlt part, but it works something like this.

x -- the time the original code took to execute
y -- the time addition incurred by a branch-prediction failure
z -- the time of an indirect jump
A -- the percent of the time you expect a branch-prediction failure to occur

So if x+y*A >= z, then you can argue that in most cases the function-pointer method is more efficient. You wouldn''t want to apply this method to anything but some portion of code that is executed repeatedly though, as the average case argument falls apart. You would never want to do this on code that will be executed only a few times or simply not very often. The other drawback to this is that since C and C++ do not support heterogenous arrays, the function signature is limited to one case. This may or may not pose a problem for your design.
quote:
the time the original code took to execute


By this I meant the time it takes for a regular if-statement to be executed, which is just a direct jump or executed the next instruction of code, not the code contained in the if-statement itself. Anyway, it depends mostly on the percentage of time that you expect a branch-predition failure to occur.
Calls/ret *also* need to be predicted, just like branches. If the CPU can''t predict a call, because the call will depend on the value of a register (or variable), then it ends up being as expensive (if not more) than a mispredicted branch.

The same applies to the ret too, so don''t think about messing up stack values to go around this.

Unless the code inside if (color == WHITE) {} else {} is dead simple, don''t think for a second that it will run faster without the branch. It most likely won''t. Your best bet is to move the conditionals outside of the inner loop (assuming your code is inside an inner loop). If that code isn''t inside an inner loop, then you are most likely trying to optimize the wrong code.

As always, profile before you decide to optimize. Think before you code

Function pointers aren''t the solution -- they''ll make things much worse.

They won''t be predicted properly (I don''t think branch prediction algorithms even try to make predictions for indirect branches -- or maybe they do?)

Spend your effort on minimizing the amount of conditions to test and maximizing the amount of work done within the conditional code blocks.

---
Bart
----Bart
two refactorings

replace conditional with polymorphism

replace conditional with visitor see more information

[edited by - petewood on November 19, 2002 3:22:21 AM]
Like the lowermost AP said, compiler can''t predict the jump until it has fetched the address from a register.. This efficiently halts the pipeline and is much harder to predict than a normal conditional branch which can only go to two fixed addresses.

No; branchless code really means you don't jump - at all!

You'll probably need to do this kind of thing at assembly level (to derive any befit from it, anyhow). It's fairly common in MMX code for example. The basic principle is that you have multiple answers to a formula (say, you wish to get the absolute value of an integer, where you would have the original number and it's negative). You'd then generate a mask that when anded would eliminate the incorrect answer(s).


    //eg: get absolute value of integer// start with number in eax:xor  edx, edxmov  ecx, eaxtest eax, eaxsetl dlneg  ecxdec  edx // if eax is less than zero, mask is 0, otherwise it's 0xFFFFFFFFxor  ebx, ebxtest ecx, ecxsetl bl	// generate mask for ecxand  eax, edx // mask eaxdec  ebxand  ecx, ebx // mask ecxor   eax, ecx // combine answers, only positive answer can result   


Of course, as I'm sure someone will point out this could be done faster by doing using the P6 instruction CMOVcc - but this is just to illustrate the idea! as you can see this has no jumps or calls in it - my guess is that this should execute in about 6 cycles or so.


  mov    ecx, eaxneg    ecxcmovge eax, ecx  


Hope that gives you an idea of what it's about

[EDIT: I'll try to actually put the correct code in this time!]

[edited by - NickB on November 19, 2002 5:25:57 AM]

This topic is closed to new replies.

Advertisement