• Advertisement
Sign in to follow this  

clamping a floating point number between 0-1

This topic is 3624 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

whats generally the best/fastest way? cheers zed

Share this post

Link to post
Share on other sites
The shortest way to clamp a floating-point number, in terms of code, is max(0f, min(1f, x)), unless you have access to a specialized primitive. If profiling tells you that this operation is too slow, then you can try optimizations:
  • Attempt to eliminate some cases (for instance, numbers which are always between 0 and 1, or numbers that are always greater than 0, and so on) to reduce the number of operations.
  • Attempt to reduce the number of floating-point numbers you have to process.
  • Attempt to use vectorial operations by clamping several numbers at the same time.

Share this post

Link to post
Share on other sites
I use the following instruction (in c++):

float clamped = value > 1.0f ? 1.0f : (value < 0.0f ? 0.0f : value);

It may be the case that max and min are actually implemented this way (I'm not sure) and thus the composition turns out to be something similar to what I posted...

Share this post

Link to post
Share on other sites
The std::min/max approach is so very nice and simple, and it's great to make a template out of it. However, VC2005 (release mode, default cflags) ends up generating the following:

004010F8 fldz
004010FA add esp,4
004010FD fst dword ptr [esp+2Ch]
00401101 lea ecx,[esp+24h]
00401105 fld1
00401107 fst dword ptr [esp+28h]
0040110B fcomp dword ptr [esp+24h]
0040110F fnstsw ax
00401111 test ah,41h
00401114 je main+3Ah (40111Ah)
00401116 lea ecx,[esp+28h]
0040111A fcomp dword ptr [ecx]
0040111C fnstsw ax
0040111E test ah,41h
00401121 jne main+47h (401127h)
00401123 lea ecx,[esp+2Ch]
00401127 fld dword ptr [ecx]
This is very slow and does not use capabilities of newer architectures.

With /arch:SSE2 it becomes:

00401908 fld dword ptr [esp+38h]
0040190C xorps xmm0,xmm0
0040190F fld1
00401911 movss xmm1,dword ptr [__real@3f800000 (402114h)]
00401919 add esp,4
0040191C fcomip st,st(1)
0040191E fstp st(0)
00401920 movss dword ptr [esp+3Ch],xmm0
00401926 movss dword ptr [esp+38h],xmm1
0040192C lea eax,[esp+34h]
00401930 ja main+46h (401936h)
00401932 lea eax,[esp+38h]
00401936 comiss xmm0,dword ptr [eax]
00401939 jbe main+4Fh (40193Fh)
0040193B lea eax,[esp+3Ch]
0040193F movss xmm0,dword ptr [eax]
It's now using FCOMI which is a big win, but the SSE parts are absurd. This code is just laughable.

Now using
f = (f < 0.0f)? 0.0f : f;
f = (f > 1.0f)? 1.0f : f;
it generates:

00401908 fld dword ptr [esp+40h]
0040190C add esp,4
0040190F fldz
00401911 fcomip st,st(1)
00401913 fstp st(0)
00401915 jbe main+2Ch (40191Ch)
00401917 xorps xmm0,xmm0
0040191A jmp main+42h (401932h)
0040191C movss xmm0,dword ptr [esp+3Ch]
00401922 movss xmm1,dword ptr [__real@3f800000 (402114h)]
0040192A comiss xmm0,xmm1
0040192D jbe main+42h (401932h)
0040192F movaps xmm0,xmm1
This is not much better. The approach of cignox1 appears to generate the same code.

Since poorly-predicted conditional branches are very expensive, it is worth going to considerable trouble. If ToohrVyk's vectorization is not possible, then at least the SSE MINSS and MAXSS instructions should be used. VC2005 is apparently not studly enough to manage that by itself, so using the _mm_min_ss etc. intrinsics is advisable.

If SSE is not available, then standard bit bashing applies. For the < 0 check, AND with a mask populated with the complement of the IEEE-754 sign bit (yes, this turns -0 into 0). For > 1, construct a second mask from the carry bit of the subtraction of 0x3F800000 from the float's representation; use it to select between the 1.0f constant and the previous result.

Share this post

Link to post
Share on other sites
As a side note, on cm_10 graphical chipsets from NVIDIA, the simple approach ends up simply as:
    max.f32 $x, $x, 0.0f;
min.f32 $x, $x, 1.0f;

The comparison-based approach would quite possibly compile to something at most like this, and possibly more complex:
    selp.lt.f32 $p, $x, 0.0; 
@$p mov.f32 $x, 0.0f;
selp.lt.f32 $p, $x, 1.0;
@$p mov.f32 $x, 1.0f;

I suspect that most shader languages result in a similar result, making the min-max-based approach better.

Share this post

Link to post
Share on other sites
sorry for the delay in replying,
thanks everyone, thats pretty much what ive already got, I thought there might be some special tomfoolery i dont know about i could apply to the 'special cases' of 0 + 1

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement