Conditional in a loop

Started by
5 comments, last by Drigovas 15 years, 8 months ago
If I have a really performance intensive loop, one would normally say not to ever, ever use a conditional in the loop if you can help it. The thing is, with jump predictability, doesn't it end up being free if it is guaranteed to be the same every single time. What I mean is, if you have a loop:
for (int Index = 0; Index < 1000000; Index++) {
if (SSE2_Was_Detected) PrefetchIntrinsic;
// Do some stuff here, hopefully on memory that was prefetched a few iterations ago
}
Won't that be effectively free? Or am I missing something? I know that it is also usually recommended to have a separate loop, but I'd rather have the instruction prefetch pipeline take care of it for free than double the amount of code I'm going to have (this decision will effect hundreds of specialized functions) if I can. Cheers -Scott
Advertisement
The instruction load will most likely (depending on CPU) end up being free, but the conditional will still have to be tested every loop iteration.

If you know the condition ahead of time, you test that, *then* do your loop.
--Michael Fawcett
It's far from trivial, and there is no reliable generic answer.

With threaded code, each instruction is essentially a branch. But as it turns out, the branch prediction will vary a lot (!pdf).

In simplest terms, you won't have any quarantee. You'll need to test the actual run-time of executable/CPU pair.

You might want to consider generating separate code for each case, and operating exclusively on top of that.
Compilers can sometimes detect "loop invariants" and transform the code such that the conditional is checked outside the loop.
An aggressive compiler will generally unswitch that loop and turn it into:

if (sse2)	for	{		prefetch		...	}else	for	{		...	}


It depends on your specific compiler whether or not that will happen.

As for whether or not the branch is free from a prediction POV, it should be 100% predicted not taken. However there are other considerations, like if it aliases with another branch, if there are too many branches close together, lots of other processor specific issues.
Quote:Original post by popsoftheyear
If I have a really performance intensive loop, one would normally say not to ever, ever use a conditional in the loop if you can help it. The thing is, with jump predictability, doesn't it end up being free if it is guaranteed to be the same every single time.

Never free. In the absolute best case, it is one extra instruction that must be processed. Of course, if that's the only cost, it's probably negligible.
(But if the outcome is guaranteed to be the same every time, why do you have the conditional at all?)

More realistically though, there are a few other hidden costs. Even if the branch is taken (or not taken) every time, which allows the branch predictor to predict correctly, you're still inhibiting optimization. The compiler can't reorder instructions across a potential branch, which will usually limit efficiency. Similarly, the CPU will try to reorder instructions on the fly, but again, across branches it runs into some limitations.

But of course, it depends on the compiler. It may be able to restructure the code to get rid of the branch.
If SSE2_Was_Detected can be resolved to a constant at compile time, then yes it is effectively free. In either case though [as outRider was kind enough to point out] the compiler will likely perform a loop unswitching pass, and you'll get it pretty much for free. Either way, if you have a compiler that doesn't suck, you won't be able to tell the difference with anything less precise than a system simulator.

Oh, and the issue of not reordering beyond branch boundaries, the next generation of processors will likely be supporting out of order commit of instructions that'll allow for branches to speculatively be explored several thousand instructions deep, using a checkpointing method for memory and registers. In short, they will be reordering intructions beyond jump boundaries, even many jumps deep, and just assuming they were guessed right until proven otherwise. Pretty neat, really, if you're into architecture stuff [SUN's 'Rock' is supposedly the first silicon to actually support this feature, though knowing sun.....].

This topic is closed to new replies.

Advertisement