Public Group

Safe C++ Float To Fixed Point Conversion

This topic is 2722 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

I have some double precision floating point numbers, which I would like to convert to fixed point in the form, (signed int) + (unsigned int) * 2^-32. Also, I would like to emit a warning or error of some kind in the case where the double cannot be exactly represented in this form.

I am not sure how to do this in a safe and relatively portable manner. The obvious method is to just cast it to an integer directly, but I am worried about all sorts of corner cases such as rounding and precision issues, NaNs, etc.

Share on other sites
isnan() is standard C; every C compiler should have it.

as for safely rounding:
 double round(double d) { return floor(d + 0.5); } 

that should do you fine; right?

precision issues are a result of math; not rounding or casting to integers. So unfortunately it's really hard to work around.
If you require low, constant precision on large numbers or high, constant precision on small numbers; the best way is to make a wrapper class around a large signed integer. When you want the float value; just cast to a float and divide by pow(10, precision)

Share on other sites

isnan() is standard C; every C compiler should have it.

as for safely rounding:
 double round(double d) { return floor(d + 0.5); } 

that should do you fine; right?

precision issues are a result of math; not rounding or casting to integers. So unfortunately it's really hard to work around.
If you require low, constant precision on large numbers or high, constant precision on small numbers; the best way is to make a wrapper class around a large signed integer. When you want the float value; just cast to a float and divide by pow(10, precision)

I'm not worried about precision issues as a result of math. What I'm worried about is the same float value might round to different answers depending on the internal details of the FPU and the whims of the compiler.

Share on other sites
Well, I don't know how you'd detect that. I assume you'd need to make separate cases for each compiler/FPU set, which would be a major pain and require tons of research and ultimately would probably be so inefficient that you'd be better off ignoring it.

Or you could use the method I described.

Share on other sites
Would this work?

 #include <cstdint> #include <cmath> #include <cassert> typedef int32_t sint; typedef uint32_t uint; void DoubleToInts(double arg, sint& whole, uint& frac) { whole = (sint) floor(arg); const double fracbias = 4294967296.0; const double r = arg - whole; assert(r >= 0.0 && r<1.0); frac = (uint) floor(r * fracbias); assert(whole + (frac/fracbias) == arg); } 

Share on other sites

I'm not worried about precision issues as a result of math. What I'm worried about is the same float value might round to different answers depending on the internal details of the FPU and the whims of the compiler.

That is the very nature of floating point. You are guaranteed a minimum precision for everything. For all operations you can perform the same operation multiple times and get different results, but within the proper precision it will be the same. Go read and learn about floating point precision.

For conversion to/from storage, the precision is at least 6 decimal digits for floats. It can be different, specified as either FLT_DIG (for the c-style constant) or [font="monospace"]numeric_limits<float>::digits10[/font] (for the c++ edition of the same thing).

Anything outside that precision is up to the implementation.

So back to your question about identifying if it can be directly stored in your data entry, just look at the text version. If it is contains more than the specified number of digits you can omit your warning. So for a float, 1.2345 is okay. 1.234567 is not. 1234.56 is acceptable, 1234.567 should generate a warning. 0.000123456 is acceptable, 0.0001234567 is warning material.

Share on other sites

I'm not worried about precision issues as a result of math. What I'm worried about is the same float value might round to different answers depending on the internal details of the FPU and the whims of the compiler.

Right back at you.

[quote name='frob"]That is the very nature of floating point. You are guaranteed a minimum precision for everything. For all operations you can perform the same operation multiple times and get different results, but within the proper precision it will be the same.[/quote]As mentioned, the rules are already established. The system operates to very specific, well known, and easily discoverable precision. Anything outside that precision is completely outside your control. That applies to rounding (the one you are concerned about) just as much as any other operation.

Share on other sites
C++ conversion from a floating point type to a integral type is well-defined as long as the value in the floating point type fits inside the range of the integral type: it always truncates (discards the fractional part). If the value doesn't fit inside the range of the integral type the behavior is undefined.

Share on other sites
So would it be appropriate to just use assert(fabs(x) < 0x80000000); ?

1. 1
2. 2
Rutin
19
3. 3
4. 4
5. 5

• 14
• 12
• 9
• 12
• 37
• Forum Statistics

• Total Topics
631435
• Total Posts
3000057
×