Clamping a value to a range in C++, quickly

Started by
13 comments, last by ZQJ 16 years, 4 months ago
I need some advice about how to speed up an operation I do a lot as part of a big loop: I need to take a float value and clamp it to a range. This is in C++. Here's what I do right now:

for (int i = 0; i < 512; ++i)
{
    float value = somearray;
    float clamp_at = 0.2;
    if (value > clamp_at)
        value = clamp_at;
    if (value < -clamp_at)
        value = -clamp_at;

    //... other code here ...
}
Is there some faster way to get this done? It needs to be portable across 32-bit, 64-bit, little endian, and big endian platforms... so I think that means assembly is probably out. Thanks in advance.
Advertisement
I'm not saying assembly is going to be better or not in this case, but assembly is never out in a case like this ... you would just need to have it be conditional ... and to provided the greatest benefit you would do it in a way that only makes the decision once ... like a function point, or an if outside the loop at least:

if(is x386)
386 optimized assembly
else if (is ...)
etc.
else
strait C++ version(current code maybe)

but of course better would be something like:

having constructed the architecure specific class at program load time, so that when you do:

MyArchOptimizedFunction(xxx)

it calls the right one directly
The compiler might be smart enough to optimize this anyway but probably not: you don't need to check if the value is < -clamp if the first check passed. Try this instead, should be a bit faster:
inline float clamp(float x, float a, float b){    return x < a ? a : (x > b ? b : x);}// in your loopconst float clamp_at = 0.2f;const float value = clamp(somearray, -clamp_at, clamp_at);

Game Programming Blog: www.mattnewport.com/blog

You can make a minor improvement by making the second 'if' an 'else if'. You might gain additionally if you make the 0.2 a "const float" if you don't intend on modifying it ... but only if your compiler is exceptionally talented at optimizations! Also, declare the clamp variable outside the loop, and if you want to be very picky also declare a neg_clamp_at value.

Honestly, this is not code that needs to be optimized. It takes virtually no time for the processor to execute, even if it is in a critical loop.

Regarding the previous post, using the trinary operator probably is not faster since a value is assigned EVERY time; here, an assignment only occurs if the clamp is needed (which is likely not often?).

const float clamp_at = 0.2;const float neg_clamp_at = -clamp_at;for (int i = 0; i < 512; ++i){    float value = somearray;    if (value > clamp_at)    {        value = clamp_at;    }    else if (value < neg_clamp_at)    {        value = neg_clamp_at;    }    //... other code here ...}
Hey all, thanks for the quick suggestions. This bit of code is used several times in a critical area. It sounds like my original guess wasn't missing something really huge and obvious. Nevertheless, the implementation AIDev recommended in his last post took my benchmark from 247 ms avg to 203 ms avg (lower is better). With optimizations turned on, it does 120 ms avg. Thanks again!
I'd go with mattnewport's suggestion or something like it.
Conditional assignments (? : operator) will probably perform better here. (Since branches hinder the compiler and CPU pipeline a lot in the ability to reorder and schedule instructions)
Quote:Original post by AIDev
Regarding the previous post, using the trinary operator probably is not faster since a value is assigned EVERY time; here, an assignment only occurs if the clamp is needed (which is likely not often?).

The expensive parts of the code on a modern processor will be fetching the value from the array (which may result in a cache miss) and the conditionals (which will hinder optimal instruction scheduling). An additional assignment is cheap by comparison - it's probably just a register move at worst and would likely be optimized out here anyway. If the target processor supports an fsel type instruction (like PowerPC) then the branch here could be avoided completely and the compilers I've worked with that can make that type of optimization are more likely to do it with a conditional than with an if statement.

To optimize this code significantly though you'd want to know more context for the loop - there's only so much you can do to optimize a clamp operation in isolation but often there's a smarter way to optimize at a higher level.

Game Programming Blog: www.mattnewport.com/blog

My suggestion is:

const float sup = 0.2;const float inf = -0.2;float values[512];for (int i = 0; i < 512; ++i){    values = std::min(sup, std::max(inf, somearray));}for (int i = 0; i < 512; ++i){    const float & value = values(i];    //... other code here ...}


Hopefully, the min and max will contain compiler intrinsics for the relevant platform, resulting in very few operations. The loop being small, the compiler might decide to unfold it so that any pipeline latency issues disappear, and perhaps even use SSE all by itself.

Depending on what the other code is, you may also benefit from keeping everything in a single loop.
We might be able to speed it up more if we could see what you're doing with 'value'. Afterall, if this is an inner loop, then quite possibly all the code in this loop has the same potential for speed gains.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
std::min/max looks good. On x86, what you want generated is FCOMI/FCMOV (FPU) or PFMIN (3dNow) or MINSS (SSE). I think more important than the ternary or if question is whether you have enabled SSE instructions in the code generation options. The Intel compiler is probably better at this as well.

Oh, just thought of another eevil trick. You can mask off the sign bit, compare magnitude via integer instructions, use carry to select the clamp_at or current value (<- hard, but possible to get a C compiler to generate this kind of code), then or-in the sign bit again.
However, this kind of dirty business is probably no longer worthwhile since the advent of FCOMI/FCMOV and the multimedia min instructions.
E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3

This topic is closed to new replies.

Advertisement