Safe C++ Float To Fixed Point Conversion

Started by
9 comments, last by SiCrane 13 years, 3 months ago
I have some double precision floating point numbers, which I would like to convert to fixed point in the form, (signed int) + (unsigned int) * 2^-32. Also, I would like to emit a warning or error of some kind in the case where the double cannot be exactly represented in this form.

I am not sure how to do this in a safe and relatively portable manner. The obvious method is to just cast it to an integer directly, but I am worried about all sorts of corner cases such as rounding and precision issues, NaNs, etc.
I trust exceptions about as far as I can throw them.
Advertisement
isnan() is standard C; every C compiler should have it.

as for safely rounding:

double round(double d)
{
return floor(d + 0.5);
}


that should do you fine; right?

precision issues are a result of math; not rounding or casting to integers. So unfortunately it's really hard to work around.
If you require low, constant precision on large numbers or high, constant precision on small numbers; the best way is to make a wrapper class around a large signed integer. When you want the float value; just cast to a float and divide by pow(10, precision)

isnan() is standard C; every C compiler should have it.

as for safely rounding:

double round(double d)
{
return floor(d + 0.5);
}


that should do you fine; right?

precision issues are a result of math; not rounding or casting to integers. So unfortunately it's really hard to work around.
If you require low, constant precision on large numbers or high, constant precision on small numbers; the best way is to make a wrapper class around a large signed integer. When you want the float value; just cast to a float and divide by pow(10, precision)


I'm not worried about precision issues as a result of math. What I'm worried about is the same float value might round to different answers depending on the internal details of the FPU and the whims of the compiler.
I trust exceptions about as far as I can throw them.
Well, I don't know how you'd detect that. I assume you'd need to make separate cases for each compiler/FPU set, which would be a major pain and require tons of research and ultimately would probably be so inefficient that you'd be better off ignoring it.

Or you could use the method I described.
Would this work?


#include <cstdint>
#include <cmath>
#include <cassert>

typedef int32_t sint;
typedef uint32_t uint;

void DoubleToInts(double arg, sint& whole, uint& frac)
{
whole = (sint) floor(arg);

const double fracbias = 4294967296.0;
const double r = arg - whole;

assert(r >= 0.0 && r<1.0);
frac = (uint) floor(r * fracbias);

assert(whole + (frac/fracbias) == arg);
}


I trust exceptions about as far as I can throw them.

I'm not worried about precision issues as a result of math. What I'm worried about is the same float value might round to different answers depending on the internal details of the FPU and the whims of the compiler.

That is the very nature of floating point. You are guaranteed a minimum precision for everything. For all operations you can perform the same operation multiple times and get different results, but within the proper precision it will be the same. Go read and learn about floating point precision.

For conversion to/from storage, the precision is at least 6 decimal digits for floats. It can be different, specified as either FLT_DIG (for the c-style constant) or [font="monospace"]numeric_limits<float>::digits10 (for the c++ edition of the same thing).

Anything outside that precision is up to the implementation.

So back to your question about identifying if it can be directly stored in your data entry, just look at the text version. If it is contains more than the specified number of digits you can omit your warning. So for a float, 1.2345 is okay. 1.234567 is not. 1234.56 is acceptable, 1234.567 should generate a warning. 0.000123456 is acceptable, 0.0001234567 is warning material.

Go read and learn about floating point precision.


I've already read a lot about floating point precision. Why can't you read my posts?
I trust exceptions about as far as I can throw them.

I'm not worried about precision issues as a result of math. What I'm worried about is the same float value might round to different answers depending on the internal details of the FPU and the whims of the compiler.


I've already read a lot about floating point precision. Why can't you read my posts?
Right back at you.


'frob" said:
That is the very nature of floating point. You are guaranteed a minimum precision for everything. For all operations you can perform the same operation multiple times and get different results, but within the proper precision it will be the same.
As mentioned, the rules are already established. The system operates to very specific, well known, and easily discoverable precision. Anything outside that precision is completely outside your control. That applies to rounding (the one you are concerned about) just as much as any other operation.
C++ conversion from a floating point type to a integral type is well-defined as long as the value in the floating point type fits inside the range of the integral type: it always truncates (discards the fractional part). If the value doesn't fit inside the range of the integral type the behavior is undefined.
So would it be appropriate to just use assert(fabs(x) < 0x80000000); ?
I trust exceptions about as far as I can throw them.

This topic is closed to new replies.

Advertisement