float vs double
#1 Members - Reputation: 130
Posted 04 September 2005 - 03:19 PM
#2 Members - Reputation: 826
Posted 04 September 2005 - 04:46 PM
#3 Members - Reputation: 960
Posted 04 September 2005 - 04:52 PM
#5 Members - Reputation: 1900
Posted 04 September 2005 - 05:18 PM
typedef float Scalar;
If you want to compare performance or switch to some other type, just replace float with the desired type. A more flexible method is to make the type a template parameter:
template <class Scalar = float>
class Vector3
{
public: Scalar x, y, z;
};
typedef Vector3<> Vector3f;
typedef Vector3<double> Vector3d;
Float and double are the most obvious choices for type, but the above method leaves the door open for other options, such as a custom rational number class.
#6 Members - Reputation: 110
Posted 04 September 2005 - 11:13 PM
#7 Members - Reputation: 494
Posted 05 September 2005 - 12:03 AM
#8 Members - Reputation: 241
Posted 05 September 2005 - 12:43 AM
but i don t know about the memory bus
floats are defined by the IEEE 754
1 bit sign
8 bit exponent
23 bit mantissa
usually floats are represented as normalized floats, that means
1,mantissa * 2^Exponent
the 8 bits for exponent are used as follows
Exp8Bit -127 = Exponent so you can use shift the bits infront or behind the comma to the left or right 2^-# or 2^#
when adding 2 floats you add
1,mantissabits *optionally create a B-2 complement on subtractions*
now an example
4960 = 2^12 =1_0000_0000_0000
so the float representation is
1,0000_0000_0000_XXXX_XXXX_XXX * 2^12
as you see you shift 12 bits to the right which means you loose 12 bits of the mantissa for the value infront of the comma
so you have got 11 bits left which is 0.00048828125f minimum precision
calculated a follows:
1/2^11 == 2^-11
you get the mantissa by deviding the values of the mantissa / 2^23
as you see you loose quite a bit of precision with larger values
4096*4096*4096 means shifting 36 bits to the right the mantissa has only 23 bits so in theory you might even loose some precision infront of the comma
but i think as already stated above the FPU probably uses higher precision floats internally which in fact isn t that expensive at all since you only need to add 24 bits of a 32 bit float the exponen stays the same and in the end you should left or right to normalize the float to 1,mantissa
Hope that helps
P.S.: a good way to increase precision is to reorder the operations in a way that you don t get too large values so instead of
4096.0345345^3/2048 you could do (4096/2048)*4096^2
although this is usually not possible at runtime
#9 Members - Reputation: 241
Posted 05 September 2005 - 01:10 AM
so you can stick with simple floats as long as you compile with the latest processor packs for VC++
for the gcc read the manpages
for vc++
/G7 optimized code for intel and AMD cpus
/arch:SSE2 or /arch:SSE makes use of SSE and SSE2 SSE3 might work similarily but the compilers i am using atm doesn t support SSE3 so i can t say
#10 Members - Reputation: 134
Posted 05 September 2005 - 01:42 AM
As floats / doubles cant contain every single number possible - i.e. the number they store in memory is actually a calculation for the final number. Which leads to certain cases where:
int main()
{
float a = 2.501f;
a *= 1.5134f;
if (a == 3.7850134) cout << "Expected value" << endl;
else cout << "Unexpected value" << endl;
}
would print "Unexpected value".
#11 Banned - Reputation: 794
Posted 05 September 2005 - 02:02 AM
This does lead to interesting results as a result of compiler optimisations. In debug, the intermediate values in a sequence of computations will most likely be written to memory, thus losing some precision. In release, the intermediate values are stored on the FPU stack so the precision isn't lost and thus you get different results.
It is also possible to reduce the level of precision the FPU works at, although if memory serves me right this only affects the transendental functions.
Skizz
#12 Anonymous Poster_Anonymous Poster_* Guests - Reputation:
Posted 05 September 2005 - 02:11 AM
Mike
#14 Moderators - Reputation: 2483
Posted 05 September 2005 - 04:03 AM
Those are basically the only real reasons for it. x86 does all FPU ops internally at 80 bits, but expansion from 32 bit to 80 bit generally carries no performance hit at all (it's done during the flop; I think this applies to P4 as well but I'm not sure). You do spend more memory, but that's usually not important, and if it is, you will be conscious of it (hopefully).
#15 Members - Reputation: 268
Posted 05 September 2005 - 04:33 AM
Viceversa mixing float and double can slow the processing due to castings.
Use double by default; use floats only if you really need them.
#16 Anonymous Poster_Anonymous Poster_* Guests - Reputation:
Posted 05 September 2005 - 05:38 AM
#17 Members - Reputation: 108
Posted 05 September 2005 - 05:42 AM
Know your target platform and code to it. (Where speed is critical, of course. In probably 90%+ situations, it just doesn't matter which you use, unless you particularly need greater precision, which in most games is unlikely).
#18 Members - Reputation: 174
Posted 05 September 2005 - 06:07 AM
Quote:
Original post by Skute
Should you not try to implement your own floating point class?
As floats / doubles cant contain every single number possible - i.e. the number they store in memory is actually a calculation for the final number. Which leads to certain cases where:
*** Source Snippet Removed ***
would print "Unexpected value".
NEVER EVER use == on a float
[you don't know how things are rounded, and with different compiliers or different platforms the problem is made worse with more calculations... also, not all numbers that make sense in base 10 work in binary... like 1/5 = 0.2 in decimal, and is about 0.001100110011001100110011001100110011001100110011001100110011001101... in binary]
no, because how would you represent 1/3? or pi? also, it would be much much slower and give you no reasonable benefit
Quote:
Original post by BittermanAndy
Floats are not always faster than doubles. For example on the platform I'm working on now, doubles are actually faster, as all floating point operations are native to doubles so floats get converted to doubles and back anyway. The extra memory is also unlikely to be an issue.
the playstation 2 doesn't have double precision support
#20 Members - Reputation: 806
Posted 06 September 2005 - 01:40 AM
Quote:
Original post by blizzard999
In my opinion on modern processors (like P4 and its 128 bit SIMD floating point arithmetic) using floats give no speed benefits.
There are 128 bit instructions where those 128 bits hold 4 floats.
There are 128 bit instructions where those 128 bits hold 2 doubles.
Using floats with 128bit instructions can give a 100% speed up over doubles






