Sign in to follow this  

Avoiding floor calls when handling non-normalized uvs

This topic is 2849 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I am working on a piece of code where I need to deal with uvs that are not necessarily in the 0 to 1 range. As an example, sometimes I will get a uv with a u component that is 1.2. In order to handle this I am implementing a wrapping which causes tiling by doing the following: u -= floor(u) v -= floor(v) Doing this causes 1.2 to become 0.2 which is the desired result. It also handles negative cases, such as -0.4 becoming 0.6. However, these calls to floor are rather slow. I have profiled my application using Intel VTune and I am spending a huge amount of cycles just doing this floor operation. Having done some background reading on the issue, I have come up with the following function which is a bit faster but still leaves a lot to be desired (I am still incurring type conversion penalties, etc). int inline fasterfloor( const float x ) { return x > 0 ? (int) x : (int) x - 1; } I have seen a few tricks that are accomplished with inline assembly but nothing that seems to work exactly correct or have any significant speed improvement. Does anyone know any tricks for handling this kind of scenario?

Share this post


Link to post
Share on other sites
I'm not quite sure what you're trying to achieve, but it looks you could use the fmod function.

That said, your fasterfloor() function converts floats to integer, which is much slower than regular arithmetic operations or int-to-float conversion.

There's not much more to it. This is why there's specialized HW (the GPU) to handle massive floating point operations, instead of being done in the CPU. I suggest using SSE2 to do those operations.

One more thing: are you sure you're not bandwidth limited? It may not be the conversion what's bottlenecking you, but rather how many variables you're fetching per second. Data locality would help a lot.

Cheers
Dark Sylinc

Share this post


Link to post
Share on other sites
it definitely looks like you're trying to achieve an fmod(x, 1)
however, don't use fmodf. it is atrociously slow in all standard compiler-implementations I have seen.

if you have access so SSE2, you could use something like:


__forceinline float Frac(const float &x)
{
__m128 scalar = _mm_load_ss(&x);
__m128 floored = scalar;
floored = _mm_cvtsi32_ss(floored, _mm_cvttss_si32(floored));
scalar = _mm_sub_ss(scalar, floored);
return scalar.m128_f32[0];
}



note that this won't work for fp values outside the [INT_MIN, INT_MAX] bounds.
but I suppose you don't care much about these cases? the fraction would be zero anyway (way before reaching values greater than 2 billions).
if you absolutely need to handle the case, you can force the result to zero by performing an SSE logical AND with a mask obtained from an fp compare with a large enough value, above which you consider precision loss is large enough to assume frac == 0.

Share this post


Link to post
Share on other sites

This topic is 2849 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this