Jump to content
  • Advertisement
Sign in to follow this  
popsoftheyear

Conditional in a loop

This topic is 3671 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

If I have a really performance intensive loop, one would normally say not to ever, ever use a conditional in the loop if you can help it. The thing is, with jump predictability, doesn't it end up being free if it is guaranteed to be the same every single time. What I mean is, if you have a loop:
for (int Index = 0; Index < 1000000; Index++) {
if (SSE2_Was_Detected) PrefetchIntrinsic;
// Do some stuff here, hopefully on memory that was prefetched a few iterations ago
}
Won't that be effectively free? Or am I missing something? I know that it is also usually recommended to have a separate loop, but I'd rather have the instruction prefetch pipeline take care of it for free than double the amount of code I'm going to have (this decision will effect hundreds of specialized functions) if I can. Cheers -Scott

Share this post


Link to post
Share on other sites
Advertisement
The instruction load will most likely (depending on CPU) end up being free, but the conditional will still have to be tested every loop iteration.

If you know the condition ahead of time, you test that, *then* do your loop.

Share this post


Link to post
Share on other sites
It's far from trivial, and there is no reliable generic answer.

With threaded code, each instruction is essentially a branch. But as it turns out, the branch prediction will vary a lot (!pdf).

In simplest terms, you won't have any quarantee. You'll need to test the actual run-time of executable/CPU pair.

You might want to consider generating separate code for each case, and operating exclusively on top of that.

Share this post


Link to post
Share on other sites
Compilers can sometimes detect "loop invariants" and transform the code such that the conditional is checked outside the loop.

Share this post


Link to post
Share on other sites
An aggressive compiler will generally unswitch that loop and turn it into:

if (sse2)
for
{
prefetch
...
}
else
for
{
...
}


It depends on your specific compiler whether or not that will happen.

As for whether or not the branch is free from a prediction POV, it should be 100% predicted not taken. However there are other considerations, like if it aliases with another branch, if there are too many branches close together, lots of other processor specific issues.

Share this post


Link to post
Share on other sites
Quote:
Original post by popsoftheyear
If I have a really performance intensive loop, one would normally say not to ever, ever use a conditional in the loop if you can help it. The thing is, with jump predictability, doesn't it end up being free if it is guaranteed to be the same every single time.

Never free. In the absolute best case, it is one extra instruction that must be processed. Of course, if that's the only cost, it's probably negligible.
(But if the outcome is guaranteed to be the same every time, why do you have the conditional at all?)

More realistically though, there are a few other hidden costs. Even if the branch is taken (or not taken) every time, which allows the branch predictor to predict correctly, you're still inhibiting optimization. The compiler can't reorder instructions across a potential branch, which will usually limit efficiency. Similarly, the CPU will try to reorder instructions on the fly, but again, across branches it runs into some limitations.

But of course, it depends on the compiler. It may be able to restructure the code to get rid of the branch.

Share this post


Link to post
Share on other sites
If SSE2_Was_Detected can be resolved to a constant at compile time, then yes it is effectively free. In either case though [as outRider was kind enough to point out] the compiler will likely perform a loop unswitching pass, and you'll get it pretty much for free. Either way, if you have a compiler that doesn't suck, you won't be able to tell the difference with anything less precise than a system simulator.

Oh, and the issue of not reordering beyond branch boundaries, the next generation of processors will likely be supporting out of order commit of instructions that'll allow for branches to speculatively be explored several thousand instructions deep, using a checkpointing method for memory and registers. In short, they will be reordering intructions beyond jump boundaries, even many jumps deep, and just assuming they were guessed right until proven otherwise. Pretty neat, really, if you're into architecture stuff [SUN's 'Rock' is supposedly the first silicon to actually support this feature, though knowing sun.....].

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!