clamping a floating point number between 0-1

Started by
5 comments, last by zedz 16 years, 1 month ago
whats generally the best/fastest way? cheers zed
Advertisement
The shortest way to clamp a floating-point number, in terms of code, is max(0f, min(1f, x)), unless you have access to a specialized primitive. If profiling tells you that this operation is too slow, then you can try optimizations:
  • Attempt to eliminate some cases (for instance, numbers which are always between 0 and 1, or numbers that are always greater than 0, and so on) to reduce the number of operations.
  • Attempt to reduce the number of floating-point numbers you have to process.
  • Attempt to use vectorial operations by clamping several numbers at the same time.
I use the following instruction (in c++):

float clamped = value > 1.0f ? 1.0f : (value < 0.0f ? 0.0f : value);

It may be the case that max and min are actually implemented this way (I'm not sure) and thus the composition turns out to be something similar to what I posted...
template < typename T > T clamp( T min, T value, T max ) { assert( min <= max ); return std::max( min, std::min( max, value ) ); }float clamped = clamp( 0.0f, value, 1.0f );


Is how I'd do it.
The std::min/max approach is so very nice and simple, and it's great to make a template out of it. However, VC2005 (release mode, default cflags) ends up generating the following:
004010F8  fldz             004010FA  add         esp,4 004010FD  fst         dword ptr [esp+2Ch] 00401101  lea         ecx,[esp+24h] 00401105  fld1             00401107  fst         dword ptr [esp+28h] 0040110B  fcomp       dword ptr [esp+24h] 0040110F  fnstsw      ax   00401111  test        ah,41h 00401114  je          main+3Ah (40111Ah) 00401116  lea         ecx,[esp+28h] 0040111A  fcomp       dword ptr [ecx] 0040111C  fnstsw      ax   0040111E  test        ah,41h 00401121  jne         main+47h (401127h) 00401123  lea         ecx,[esp+2Ch] 00401127  fld         dword ptr [ecx] 
This is very slow and does not use capabilities of newer architectures.

With /arch:SSE2 it becomes:
00401908  fld         dword ptr [esp+38h] 0040190C  xorps       xmm0,xmm0 0040190F  fld1             00401911  movss       xmm1,dword ptr [__real@3f800000 (402114h)] 00401919  add         esp,4 0040191C  fcomip      st,st(1) 0040191E  fstp        st(0) 00401920  movss       dword ptr [esp+3Ch],xmm0 00401926  movss       dword ptr [esp+38h],xmm1 0040192C  lea         eax,[esp+34h] 00401930  ja          main+46h (401936h) 00401932  lea         eax,[esp+38h] 00401936  comiss      xmm0,dword ptr [eax] 00401939  jbe         main+4Fh (40193Fh) 0040193B  lea         eax,[esp+3Ch] 0040193F  movss       xmm0,dword ptr [eax] 
It's now using FCOMI which is a big win, but the SSE parts are absurd. This code is just laughable.

Now using
f = (f < 0.0f)? 0.0f : f;f = (f > 1.0f)? 1.0f : f;
it generates:
00401908  fld         dword ptr [esp+40h] 0040190C  add         esp,4 0040190F  fldz             00401911  fcomip      st,st(1) 00401913  fstp        st(0) 00401915  jbe         main+2Ch (40191Ch) 00401917  xorps       xmm0,xmm0 0040191A  jmp         main+42h (401932h) 0040191C  movss       xmm0,dword ptr [esp+3Ch] 00401922  movss       xmm1,dword ptr [__real@3f800000 (402114h)] 0040192A  comiss      xmm0,xmm1 0040192D  jbe         main+42h (401932h) 0040192F  movaps      xmm0,xmm1 
This is not much better. The approach of cignox1 appears to generate the same code.

Since poorly-predicted conditional branches are very expensive, it is worth going to considerable trouble. If ToohrVyk's vectorization is not possible, then at least the SSE MINSS and MAXSS instructions should be used. VC2005 is apparently not studly enough to manage that by itself, so using the _mm_min_ss etc. intrinsics is advisable.

If SSE is not available, then standard bit bashing applies. For the < 0 check, AND with a mask populated with the complement of the IEEE-754 sign bit (yes, this turns -0 into 0). For > 1, construct a second mask from the carry bit of the subtraction of 0x3F800000 from the float's representation; use it to select between the 1.0f constant and the previous result.
E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3
As a side note, on cm_10 graphical chipsets from NVIDIA, the simple approach ends up simply as:
    max.f32 $x, $x, 0.0f;    min.f32 $x, $x, 1.0f;


The comparison-based approach would quite possibly compile to something at most like this, and possibly more complex:
    selp.lt.f32 $p, $x, 0.0; @$p mov.f32 $x, 0.0f;    selp.lt.f32 $p, $x, 1.0;@$p mov.f32 $x, 1.0f;


I suspect that most shader languages result in a similar result, making the min-max-based approach better.
sorry for the delay in replying,
thanks everyone, thats pretty much what ive already got, I thought there might be some special tomfoolery i dont know about i could apply to the 'special cases' of 0 + 1

This topic is closed to new replies.

Advertisement