• 12
• 15
• 19
• 27
• 9

# Avoiding floor calls when handling non-normalized uvs

This topic is 2939 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I am working on a piece of code where I need to deal with uvs that are not necessarily in the 0 to 1 range. As an example, sometimes I will get a uv with a u component that is 1.2. In order to handle this I am implementing a wrapping which causes tiling by doing the following: u -= floor(u) v -= floor(v) Doing this causes 1.2 to become 0.2 which is the desired result. It also handles negative cases, such as -0.4 becoming 0.6. However, these calls to floor are rather slow. I have profiled my application using Intel VTune and I am spending a huge amount of cycles just doing this floor operation. Having done some background reading on the issue, I have come up with the following function which is a bit faster but still leaves a lot to be desired (I am still incurring type conversion penalties, etc). int inline fasterfloor( const float x ) { return x > 0 ? (int) x : (int) x - 1; } I have seen a few tricks that are accomplished with inline assembly but nothing that seems to work exactly correct or have any significant speed improvement. Does anyone know any tricks for handling this kind of scenario?

##### Share on other sites
I'm not quite sure what you're trying to achieve, but it looks you could use the fmod function.

That said, your fasterfloor() function converts floats to integer, which is much slower than regular arithmetic operations or int-to-float conversion.

There's not much more to it. This is why there's specialized HW (the GPU) to handle massive floating point operations, instead of being done in the CPU. I suggest using SSE2 to do those operations.

One more thing: are you sure you're not bandwidth limited? It may not be the conversion what's bottlenecking you, but rather how many variables you're fetching per second. Data locality would help a lot.

Cheers
Dark Sylinc

##### Share on other sites
it definitely looks like you're trying to achieve an fmod(x, 1)
however, don't use fmodf. it is atrociously slow in all standard compiler-implementations I have seen.

if you have access so SSE2, you could use something like:

__forceinline float	Frac(const float &x){	__m128	scalar = _mm_load_ss(&x);	__m128	floored = scalar;	floored = _mm_cvtsi32_ss(floored, _mm_cvttss_si32(floored));	scalar = _mm_sub_ss(scalar, floored);	return scalar.m128_f32[0];}

note that this won't work for fp values outside the [INT_MIN, INT_MAX] bounds.
but I suppose you don't care much about these cases? the fraction would be zero anyway (way before reaching values greater than 2 billions).
if you absolutely need to handle the case, you can force the result to zero by performing an SSE logical AND with a mask obtained from an fp compare with a large enough value, above which you consider precision loss is large enough to assume frac == 0.