Floor functon definition

Started by
8 comments, last by JohnnyCode 10 years, 8 months ago

Hi.

Is it possible to return a closest integer to a floating number by defining a funtion that uses only +,-,*,/ operations?

Advertisement
float round(float x) {
  x += 12582912.0f;
  x -= 12582912.0f;
  return x;
}
I am not sure the exact range of inputs for which that works, but I think it's the best you can do.

EDIT: Removed meaningless ".5f" at the end of the first constant: That bit is out of the precision of the float, so the code will compile to the same thing whether you put a ".0f" or a ".5f" there.

I must admit I do not see how this could work though. Could you elaborate on the definition a bit? About the two constants and such. Thanks a lot

Let's start with a number that smallish (say, less than 4 million). The constant is designed so the sum of x plus the constant will have its exponent so large that the 24 bits of precision in a float will precisely allow it to represent integers (meaning, the distance between consecutive floats in this range is 1). That sum is where the rounding happens, because the bits that don't fit are discarded, hopefully with some reasonable rounding rules. The subtraction of the magic constant brings the number back to its original range, but we have lost the bits beyond the integer part.

EDIT: Removed some nonsense about adding 0.5.

What would the constant be for 64 bit floats? I gess the same since exponent is the same range as in 64 bit, but it seems not to work.

I could truncate the float by shifting significant bits to right by amount of (bias)-(exponent unbiased) times and then set exponent to bias. But I cannot afford bitwise operations, only algebraic ones, how could I do this with algebraic operation? Thanks a lot

Is this some kind of arbitrary challenge? What's the actual problem, and why can you only use +,-,*,/?

Is this some kind of arbitrary challenge? What's the actual problem, and why can you only use +,-,*,/?

I need to do the operation on gpu, a float resulting to a float. So I cannot use things like modulo, or bitwise ops. Using standard operations would also make this definable as a proper math function, but that is not my concern.

GPU's can do floor natively, so using the built-in floor function will most likely be much faster than emulating it yourself using a bunch of arithmetic.

Is there a reason you're avoiding the built-in implementation?

The constant for a double would be 3*2^51 = 6755399441055744.0

Alvaro you rule the world. The posted hacks works for 32 and 64 bit IEEE floatings like a charm.

This topic is closed to new replies.

Advertisement