C++ unsigned __int64 to/from double conversions

Started by
23 comments, last by SiCrane 12 years, 1 month ago
Advertisement
Mantear
Author
251
January 02, 2011 09:31 AM
I'm implementing some windowing algorithms using templates to accept any sort of input data type. Everything works except for unsigned 64-bit values. I calcuate the windowing scale value for the current sample using doubles. When I convert the input uint64 sample to a double, multiply by the scaling value, and store it to another double, the resulting value looks fine. But when I try to convert the double back to a uint64, the value gets clamped to the maximum signed int64. Why in the world would this be the case?

EDIT: Adding some example code.

unsigned long long uint64 = 0;long long int64 = 0;double x = 0.0;    // Example 1 - works as expectedint64 = 0x7FFFFFFFFFFFF000;x = static_cast<double>(int64);int64 = static_cast<long long>(x);// Example 2 - the sign changes?  Changes to 0x8000000000000000// Same applies to an input of 0x7FFFFFFFFFFFFF00int64 = 0x7FFFFFFFFFFFFFFF;x = static_cast<double>(int64);int64 = static_cast<long long>(x);// Clamped to maximum of 'long long' + 1 instead of 'unsigned long long'// Changes to 0x8000000000000000uint64 = 0xFFFFFFFFFFFFFFFF;x = static_cast<double>(uint64);uint64 = static_cast<unsigned long long>(x);


[Edited by - Mantear on January 3, 2011 11:30:50 AM]
Erik Rufelt
January 02, 2011 10:53 AM
0x8000000000000000 is not the maximum signed long long, it is the minimum signed long long (most negative value). You are casting to long long in the last example too, perhaps you want to cast to unsigned long long?
Also, doubles at that range can't represent every integer exactly. When I tried casting 18,446,744,073,709,551,615 to double, I got 18,446,744,073,709,552,000, which is a larger number (nearest representable number). Casting this back will overflow.
bubu LV
1,436
January 02, 2011 11:17 AM
double can store only 52 significant bits: http://en.wikipedia.org/wiki/Floating_point#IEEE_754:_floating_point_in_modern_computers. [unsigned] long long has 64 bits. So you will lose lowest 12 bits, if long long value has all the 64 bits significant.
Mantear
Author
251
January 02, 2011 03:12 PM
Quote:Original post by bubu LV
double can store only 52 significant bits: http://en.wikipedia.org/wiki/Floating_point#IEEE_754:_floating_point_in_modern_computers. [unsigned] long long has 64 bits. So you will lose lowest 12 bits, if long long value has all the 64 bits significant.


It's not losing precision that is a problem. I can understand rounding occuring. But it's clamping half the range of an unsigned long long.

From what I can find via Google, this doesn't happen under GCC (I'm using Visual Studio 2010).
ApochPiQ
23,137
January 02, 2011 03:32 PM
Um... your code casts everything to long long, not unsigned long long. So why are you surprised that your code returns a long long representable value instead of an unsigned one?

As near as I can tell, your code does precisely what you ask the compiler to do.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Mantear
Author
251
January 03, 2011 11:35 AM
Quote:Original post by ApochPiQ
Um... your code casts everything to long long, not unsigned long long. So why are you surprised that your code returns a long long representable value instead of an unsigned one?

As near as I can tell, your code does precisely what you ask the compiler to do.


Sorry, I fat-fingered copying the source code. I updated it to show:
uint64 = 0xFFFFFFFFFFFFFFFF;                    // uint64 is '0xFFFFFFFFFFFFFFFF' or 18446744073709551615x = static_cast<double>(uint64);                // x is 1.8446744073709552e+019uint64 = static_cast<unsigned long long>(x);    // uint64 is now '0x8000000000000000' or 9223372036854775808
Erik Rufelt
January 03, 2011 12:09 PM
The problem remains the same, the double is larger than the largest value that can fit in an unsigned long long. You get the same behavior if you set the double manually to something 10x larger. I'm pretty sure the behavior is undefined.
How do you want it to act?
Do you want it to clamp?
Wrap around?
The normal behavior for assigning integers a larger value is to wrap around, so if there was a 128-bit integer and you converted to that first, and then to a 64-bit integer, you would get 374, as the double you're trying to convert is 375 larger than the largest unsigned 64-bit integer.
However, you are not converting to a 128-bit integer in between, so what do you expect to happen?

If you need some arbitrary rules for double to integer conversion, write your own conversion routine that forces those rules.
Rattenhirn
January 03, 2011 12:09 PM
Quote:Original post by Mantear
It's not losing precision that is a problem. I can understand rounding occuring. But it's clamping half the range of an unsigned long long.

From what I can find via Google, this doesn't happen under GCC (I'm using Visual Studio 2010).


Try changing the fp model to a more precise one in VS 2010 and you should be able to replicate the results from GCC.
ApochPiQ
23,137
January 03, 2011 12:21 PM
Interesting.

At least under MSVC2005, this does appear to be a bug in the standard runtime. static_cast<unsigned long long>(double) produces a call to _ftol2, which does not respect unsigned-ness. A couple minutes on Google confirms that this bug still exists in VS2008 and maybe 2010 as well.

There are cases where this works as expected: namely in the presence of certain 64-bit or SSE instructions, which apparently aren't assumed to be supported all the time by MSVC, and therefore not used directly; so if your compiler emits a call to _ftol2, you're probably up a creek.


Give me a few minutes and I'll see if I can hack a safe conversion that works on 32-bit CPUs.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

ApochPiQ
23,137
January 03, 2011 01:05 PM
OK, this can't be blamed on the CRT, but rather on the CPU itself.

unsigned long long mycast(double x){	unsigned long long retval;	__asm	{		fld x		fld st(0)		fistp retval	}	return retval;}



This is a minimal reproduction of _ftol2, without special handling for sign-bit extension or NaN values. It gives the exact same behaviour on my Centrino Duo as the OP reports: namely, the truncation occurs and a bogus value is written into retval.

According to IA-32 Intel Architecture Software Developer's Manual, Volume 2A: Instruction Set Reference A-M, this is expected behavior.

Why GCC/strict-FP VS2010 doesn't do this I am not sure.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Share:

This topic is closed to new replies.

Advertisement