# float vs double

This topic is 4511 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Thanks

##### Share on other sites
Quote:
 Original post by BasirorI just looked something up, SSE2 is supposed to support 128bit registers to perform 2 double precision operations in one step

yeah but you can also do 4 floating point precision operations in one step.

you get 8 of them registers i think
which makes matrix*matrix very fast. You can the first matrix in the first 4 registers (4 * 4 matrix) then the next matrix in the next 4 registers. If you are using double it will take atleast twice as long.

I havent put this into practice but i remember the theory in my book. Maybe there are other things to consider which i have missed.

there is also the shuffle operation which uses 4 floats. If you are using doubles youll be messing around with 2 registers and moving stuff manually - slow.

Yeah, graphics processors and fpu's are usually (afaik) optimized to use floats.

##### Share on other sites
Quote:
 NEVER EVER use == on a float

Very true, a most useful function for any program dealing with floating values:

bool compareFloat ( float Value1 , float Value2 , float Tolerance ){   if ( fabs ( Value1 - Value2 ) &lt; Tolerance )      return true ;   else      return false ;}

The tolerance can be hard coded if desired.

(Pardon if my formatting is off, I've been programming in so many C variant scripting languages lately I can't keep em straight...)

##### Share on other sites
Quote:
Original post by Riviera Kid
Quote:
 Original post by BasirorI just looked something up, SSE2 is supposed to support 128bit registers to perform 2 double precision operations in one step

yeah but you can also do 4 floating point precision operations in one step.

you get 8 of them registers i think
which makes matrix*matrix very fast. You can the first matrix in the first 4 registers (4 * 4 matrix) then the next matrix in the next 4 registers. If you are using double it will take atleast twice as long.

I havent put this into practice but i remember the theory in my book. Maybe there are other things to consider which i have missed.

there is also the shuffle operation which uses 4 floats. If you are using doubles youll be messing around with 2 registers and moving stuff manually - slow.

Yeah, graphics processors and fpu's are usually (afaik) optimized to use floats.

yes i know but in some cases this leads to some little problems concerning precision
the few matrix operations i have to perform aren t critical anyways since most of the work will be moved to the gpu its optimized for this kind of operations

##### Share on other sites
Quote:
 NEVER EVER use == on a float
I do something like this occasionally:
float val = MAXFLOAT;...loop which may change val...if (val == MAXFLOAT)    ...
I haven't had any problems with this but admittedly I'm mostly only running debug builds where this just might be ok. Is it likely to screw up on different platforms or with different compile flags maybe?

##### Share on other sites
Quote:
Original post by zppz
Quote:
 NEVER EVER use == on a float
I do something like this occasionally:*** Source Snippet Removed ***I haven't had any problems with this but admittedly I'm mostly only running debug builds where this just might be ok. Is it likely to screw up on different platforms or with different compile flags maybe?

const float Y;float X = Y;....if(X==Y){}

That will work fine to check if the code does not modify X, but if you alter the value of X at all it may not work due to rounding errors in whatever function you used.

##### Share on other sites
Quote:
 NEVER EVER use == on a float

Sometimes I've thought to myself that using == or != on a float or double should arguably be a compiler warning. It does not work the way that most people expect it to work :-( At least until they really understand the problem

##### Share on other sites
Quote:

Quote:
 Quote:Original post by BittermanAndyFloats are not always faster than doubles. For example on the platform I'm working on now, doubles are actually faster, as all floating point operations are native to doubles so floats get converted to doubles and back anyway. The extra memory is also unlikely to be an issue.

the playstation 2 doesn't have double precision support

And? I'm not working on PS2.

The point is, the question "are floats or doubles faster?" has only one answer: "depends on your platform".

##### Share on other sites
I made a small app to test the time difference using the performance counter. On 10,000 divisions (10,000 with floats and 10,000 with doubles), I expected the doubles to be at least a little slower, but I found they were exactly the same, sometimes one or the other was faster, but on average they were exactly the same.
However, I could see how memory bandwidth could be an issue in some games. If you have 10mb of vertex data with floats and 20mb with doubles, you'll defiantly see a performance hit using doubles.

##### Share on other sites
10,000 divisions is really not a very complicated test for a CPU. It's a bit like asking me if taking one step backwards can be completed in the same time as taking one step forward - the difference is negligable.

on the bog standard 0x86 FPU, all calcs happen at 80bits, whether they are double or float. however, the 0x86 can read only 4bytes at a time. So (assuming that the data is 4byte aligned), a double requires 2 reads to get it into memory, a float requires a single read.

Now for say 1 million vertices, you'd need 12,000,000 bytes to store those as floas. (approx 12mb). To store those as double, youd need approx 24mb. It is a noticable enough difference to suggest that ideally, if you can handle the lack of precision, floats are preferable. (the read times would start to have an impact on those numbers, but it will not be a *big* hit).

As mentioned, SIMD can process 4 floats or 2 doubles in a single instruction, though it's likely that only a few people here are likely to use that (DX maths has built in aligned vector types that do work with SIMD, so maybe....)

The problem with floats, is that rounding errors accumulate much quicker than they do with doubles. An inverse matrix op with floats normally requires you to orthogonalise the matrix afterwards due to rounding errors.

So, try to use floats if possible. Use doubles if that's too inaccurate. Generally though, it's only really the maths calculations that will be affected. For renderable data, floats would be far more sensible.....

##### Share on other sites
another place it may be noicable is when storing characters positions. If your character starts at a position of 0,0,0. Then floats will start losing precision as ou move further away from that point.

ie, at coords 100000,100000,100000 you may only have a small precision left to deal with....

##### Share on other sites
Thanks for the replies, everyone. Ill try to stick to floats when possible.

Quote:
 For renderable data, floats would be far more sensible.....

What exactly is "renderable data"? Is it things like colors, polygon vertex coordinates, etc.?

Quote:
 another place it may be noicable is when storing characters positions. If your character starts at a position of 0,0,0. Then floats will start losing precision as ou move further away from that point.ie, at coords 100000,100000,100000 you may only have a small precision left to deal with....

So i should store object position coordinates as floats? But these values are passed to openGL's translate function when I render, which then gets put into a matrix and is multiplied with all the verticies. Does this count as "renderable data"? Should i cast them to floats before calling the translation function? Thanks.

##### Share on other sites
All you need to know is that floats are good enough in most cases. Better to use floats initially (as a typedef though) and only if you find problems, or suspect problems should you switch.
It's far easier than starting with doubles and then switching to floats, as then you could break something due to the slight loss in accuracy.

##### Share on other sites
Quote:
 Original post by iMalcAll you need to know is that floats are good enough in most cases. Better to use floats initially (as a typedef though) and only if you find problems, or suspect problems should you switch.It's far easier than starting with doubles and then switching to floats, as then you could break something due to the slight loss in accuracy.

Concur. Unless you're doing scientific work, floats are probably the way to go.

##### Share on other sites
If you shouldn't use == or !=, then wouldn't > and < be messed up as well?

##### Share on other sites
Quote:
 Original post by RobTheBlokeon the bog standard 0x86 FPU, all calcs happen at 80bits, whether they are double or float. however, the 0x86 can read only 4bytes at a time. So (assuming that the data is 4byte aligned), a double requires 2 reads to get it into memory, a float requires a single read.

Just to clarify, entire blocks of 4k will be read out of main memory and into L1 cache at a time. These will then be loaded into registers as needed. The latter step isn't especially intensive. The thing is though, that the 4k will only fit half as many doubles as floats.

##### Share on other sites
Quote:
 Original post by Daniel MillerIf you shouldn't use == or !=, then wouldn't > and < be messed up as well?

No, < and > are fine. The problem with == and != is that rounding errors could turn what you think is 1.234567 into 1.234693, and if you try to == it with 1.234567, it'll return false, when it actually should be true. On the other hand, 1.234693 is still, say, < 0. You only have a problem if you compare 2 numbers that are very close to each other.

##### Share on other sites
at university they told us that some implementations tend to round up the last bit of the mantissa if you have a large positive exponent

although i don t know why they don t implement a hardware epsilon comparsion since the binary comparsion is
a) actually quite useless in most cases
b) could be don t by integer comparsions or bitwise and
(float pointer to intpointer, reference & right operand)

##### Share on other sites
Quote:
Original post by MauMan
Quote:
 NEVER EVER use == on a float

Sometimes I've thought to myself that using == or != on a float or double should arguably be a compiler warning.

I have to disagree. It's the best way to test for NaN. For instance,
f != f   // returns true if f is NaNf == f   // returns false if f is NaNf == NaN // always returns falsef != NaN // always returns true

I usually use a utility method to determine if numbers are sufficiently close, e.g.,
public static bool EpsilonEquals(double f1, double f2, double epsilon){  // Returns true if f1 is closer than epsilon to f2.  return (Math.Abs(f1 - f2) <= epsilon);}

or for inequalities,
public static bool EpsilonGreaterThan(double f1, double f2, double epsilon){  // Returns true if f1 is greater than f2 within tolerance.  return ((f1 - epsilon) > f2);}

public static bool EpsilonGreaterThanEqualTo(double f1, double f2, double epsilon){  // Returns true if f1 is greater than or equal to f2 within tolerance.  return ((f1 - epsilon) >= f2);}

##### Share on other sites
Quote:
 Very true, a most useful function for any program dealing with floating values:bool compareFloat ( float Value1 , float Value2 , float Tolerance ){ if ( fabs ( Value1 - Value2 ) < Tolerance ) return true ; else return false ;}The tolerance can be hard coded if desired.

Actually, that's an improvement over == but still not great... http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm explains why, and suggests better methods. (Anyone know how to make that a link? I tried [url] but that doesn't work).

##### Share on other sites
Quote:
 Original post by BittermanAndy(Anyone know how to make that a link? I tried [url] but that doesn't work).

Use HTML.

<a href="http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm">http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm</a>    ->http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm

##### Share on other sites
Yet again benchmarking?

Floats are, if used on SSE2 registers, aproximately 2 times faster than Doubles. It might look great, but it would apply only on larger matrices. I would rather have 2x slowdown, than did more often test for rounding errors and played with error cancelation. If you can use double it's not often worthy to play with floats even for slight speed increase. (BTW they are IIRC equally fast on common FPU registers.) One important situation when floats might be neccessary is transfer of prepared data to GFX card. However IIRC there might be problems if exponents would be too large.
Actually there are other situations when float might be of some importance. Div, and sqrt. With these 40 - 80 CPU cycles instructions the half data size might be important. However majority of developers are using

c = 1D/divider
for ... ca[...] = something[...] * c

One operation on high precision number will not kill the above code. Low precision data member might.

As a general rule, if you will work with data that would remain in computer memory use double. If you work with data that needs to be transfered to GFX, and they need to be transfered as a floating point use float (because your GFX card is highly likely is unable to use double, and bandwidth required would kill AGP3.0 / PCIE data transfer)

Also it looks like you'd need some decoupling of your 3D engine, game engine, and world engine.

BTW if anyone would like to play with benchmarking of floats and doubles, try scimark. It has some ability to measure speed differences.

##### Share on other sites
Dividing by a very small double can be just as disastrous as dividing by a very small float. Doubles represent a greater range than floats, but in general it is best to avoid dividing by numbers close to 0, possibly by rewriting the code in a way that avoids the division or checks for values close to 0.

It is similiar to how you never use == to compare two floating point numbers and instead check to see if they are within some epsilon value. If the denominator is very close to 0 it is safer to treat it as if it was 0. If you're dividing by a number that might be 0, don't use == to compare it to 0, use the epsilon test method.

Graphics cards can internally use whatever precision they want, possibly even something different than 32 bit or 64 bit. At certain stages they even tend to use higher precisions to avoid certain artifacts. But for displaying geometry, 32 bit accuracy is plenty.

##### Share on other sites
Quote:
 Original post by Daniel MillerIf you shouldn't use == or !=, then wouldn't > and < be messed up as well?

Yes, but the effect is usually less significant.

e.g.

This may return false:
11.1f / 11.1f == 8.0f / 8.0f

And this could return true:
11.1f / 11.1f < 8.0f / 8.0f

The difference is that there's only 1 value out of 2^32 that make the first one true, however in the second case there's roughly 2^31 values that make it true. So if you're off by 1 bit for ==, it may never return true, whereas if you're off by 1 bit for <, it'll fix itself next iteration. You can have stability problems though if the calculation "lands" on the threshold and thus cause it to iterate true/false/true/true/false/true instead of a clean false/false/false/true/true/true.

Compares on float ought to be "fuzzy" and anything that's within the eplison should be considered equal.

		struct scalar			{			typedef float float_t;			scalar() {}			explicit scalar(float_t f) : f(f) {}			float_t f;			//...			};		inline bool operator==(scalar a, scalar b)			{			return abs(b-a).f < std::numeric_limits<scalar::float_t>::epsilon();			}		inline bool operator!=(scalar a, scalar b)			{			return !(a==b);			}		inline bool operator<(scalar a, scalar b)			{			return (a.f<b.f) && (a!=b);			}

##### Share on other sites
Quote:
 Original post by Anonymous PosterGraphics cards can internally use whatever precision they want, possibly even something different than 32 bit or 64 bit. At certain stages they even tend to use higher precisions to avoid certain artifacts. But for displaying geometry, 32 bit accuracy is plenty.

ATI used 24 bits of precision for long time, they didn't delayed switching to 32 bits just for nothing. That 8 bit lower precision caused a some advantage in speed in comparison to pure FP32 bit registers on more advanced cards. Actually majority of registers are 4xFP32, and FP16 has also its place.
32 bits might be plenty if you have environment smaller than 16E6 units, and you never do any multiplication. 32 bit precision might be horrible for scaling and rotation of 3D models. When I used some Bethesda software, I could see theirs geometry precision was by few units smaller, than was neccessary to prevent fall through geometry. ^_^

##### Share on other sites

This topic is 4511 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.