# potatoe

This topic is 1264 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts



Edited by RoundPotato

##### Share on other sites

1. Yes float usage is more expensive. May not be extremely obvious in small programs, but using the correct variable type in large ones is critical.

2. You will generally use ints when youre absolutely sure the variable should only be a whole number. floats does provide larger and more precise storage but at a cost in performance. So always use integers when possible (plus some operations are easier to do with int's rather than floats)

Edited by Penanito

##### Share on other sites

In the 90's, float calculations were performed by software routines, so they were much, muh slower than ints.
Then, mid 90's, every desktop CPU started to add actual hardware support for float operations, which made them cost about the same as into operations.

Intel had a floating point co-processor available since 1980 (the 8087 was an FPU co-processor for the 8086). The Intel 80486DX (1989) had a full floating point implementation on board, while the SX variety did as well, but it was disabled due to fab issues, the 80487 was actually a full 80486DX with a bit of circuitry on board to require the 80486SX to operate. It would disable the main processor and take over ALL OPERATIONS when it was installed. The circuitry that detected the presence of the master CPU was known to be somewhat... flaky, and so many people were able to build systems with just 80487 chips in them without the additional cost of an 80486 processor.

Most floating point software actually would detect if an FPU was present on the hardware and defer operations to it when available. Since a lot of times this was provided via source based libraries this made it no more costly than most other operations reasonably complex mathematical operations (when an FPU was present). However, FPU instructions were still quite slow with relation to integer based ones, even with an FPU. It took the rapid differentiation between memory fetch times and modern CPU cycle speeds, along with pipelining and clock subdivision for executing subinstruction operations before the cost has been reduced significantly enough to make them essentially identical operations.

Sometimes when building my systems I miss seeing those dual sockets both populated by the most powerful silicon available to the general public of the time... Edited by Washu

##### Share on other sites

Sometimes when building my systems I miss seeing those dual sockets both populated by the most powerful silicon available to the general public of the time...
You can still put a bunch of Xeons with 30 threads each on a single board Washu :)

##### Share on other sites

2. If 32 bit floats can hold data from -3.14 * 10 ^ -38 to 3.14 * 10 ^ 38 without precision loss then why would anyone use Ints if they can only store from -2 * 10 ^ 9 to 2 * 10 ^ 9 ?

Actually, in Lua all numbers are FP32. And it's a pain in the ass, because all of a sudden you loose the ability to store a 32-bit hash or a 32-bit Unicode codepoint as a regular number. Indices for an array can not only be negative, but can also be fractions or NANs.

##### Share on other sites

Actually, in Lua all numbers are FP32. And it's a pain in the ass, because all of a sudden you loose the ability to store a 32-bit hash or a 32-bit Unicode codepoint as a regular number. Indices for an array can not only be negative, but can also be fractions or NANs.

I don't know the history or what version of Lua you're talking about, but in the version of Lua that I downloaded source for around a year ago, the base lua_Number is a double, and it is configurable.

##### Share on other sites

I guess they just use int types to be conceptually simplified or similar to math

##### Share on other sites


Edited by RoundPotato

##### Share on other sites

I don't know the history or what version of Lua you're talking about, but in the version of Lua that I downloaded source for around a year ago, the base lua_Number is a double, and it is configurable.

You are probably right, I was writing from memory, and somehow thought it was single precision. I did not know, that it is configurable though, thanks for pointing that out.

If so, is there an easy way to tell how float operation speed differs from int operation speed?

Rule(s) of thumb: If you are on the cpu, float is slightly slower then int. If you are on the GPU (especially NVidia) float is faster then int. If you are having a lot of branches, which nuke your pipeline, it doesn't matter. If you are memory bandwidth bound it doesn't matter. If you are having a lot of cache misses it doesn't matter. If you are chasing pointers it doesn't matter. If you have a low ILP it probably also doesn't matter.

##### Share on other sites

3. So practically the actual maximum safe range without losing precision is only 2^24 right? Similarly it says that the minimum range without losing precision is range 1.175494351e-38 , which I believe is also false right? If so then what is the minimum safe range?

The two things are not actually similar, beyond the fact that they both involve precision (the number of digits that can be accurately represented).

The maximum safe range is the limit of integral precision - i.e. the point beyond which a float is incapable of representing all integral bits of the number. It's not really about loss of precision on floats, but is instead the point at which integer precision exceeds that of a float, which is to say the point beyond which precision will be lost when converting from int to float. This is because the precision of a float is constant (with one exception, but we'll get into that) and based on the number of bits allocated to the mantissa, while the precision of an int varies depending upon the magnitude of the number. (For example, an int can represent a number between 8388608 and 16777215 with 24 bits of precision, but a number between 64 and 127 with only 7 bits of precision.)

Or to put it another way, (assuming 32-bit floats) any number with a magnitude of 2^24 or greater will lose precision when converted from int to float, and conversely any number with a magnitude less than 2^23 will lose precision when convereted from float to int.

The minimum range without loss of precision (which IS 1.175494351e-38 for a standard 32-bit float) is due to the existence of denormalized numbers, and represents an actual loss of precision within the float format itself. As has been mentioned, the mantissa of a float has an implied most significant bit of 1. However, for a denormalized number, the implied most significant bit of the mantissa is instead 0. Denormalized numbers are used only for extremely small magnitudes - they allow numbers closer to zero to be represented with increasing accuracy but reduced precision. Since the implied msb is 0, the precision is determined by the the highest set bit in the mantissa (much as with ints).

Note that if there were no such thing as denormalized numbers, there would be no such thing as "minimum range without loss of precision" - floats would have a constant precision.

##### Share on other sites


Edited by RoundPotato

##### Share on other sites

How? A random number 45 is below 2^23...
45.0f -> 45
where '->' is conversion to int. Where is the precision loss?

45.0f has 24 bits of precision, while 45 (as an integer) has only 6. Precision is lost when converting to int because the int has fewer significant figures.

As a float, 45.0 is distinct from 45.000004. As an int, it is not.

To put it another way, 45.00000000 is more precise than 45.0, even though all the extra digits are 0s.

##### Share on other sites

Edited by RoundPotato

##### Share on other sites

If so does this also apply to what I asked earlier
RoundPotato, on 04 Aug 2014 - 09:34 AM, said:
I think you meant 1.175494351e-38 is stated without having precision loss is because such a number can be defined with the same 'precision' of a float, that is 24 bits(where precision is defined as number of bits in the mantissa) is that what you were getting at?
then?

1.175494351e-38 is the minimum normalized value a 32 bit floating point number can represent (it is not the minimum value an IEEE 32 bit float can hold accurately).

0 00000001 000000000000000000000000 ==> 1.17549435E-38
0 00000000 000000000000000000000001 ==> 1.4E-45 (note, this is a denormalized float as the exponent is 0)

Edited by Washu

##### Share on other sites

Edited by RoundPotato

##### Share on other sites

1.175494351e-38 is the minimum normalized value a 32 bit floating point number can represent (it is not the minimum value an IEEE 32 bit float can hold accurately).

So 1.17549435E-38 is the minimum representable number without precision loss because it uses 24 bits(ala 24bit precision) and 1.4E-45 is the very minimum number that can be represented but at the loss of precision(1 bit that is the MSB because it is denormalized now), that it?

With IEEE floats you have an invisible leading 1 whenever the exponent is not zero or NaN. In other words its something like (-1) ^ sign * 2 ^ exponent * 1.mantissa. This is normalized, form, as the most significant bit is represented by the value of the exponent, giving you 1 + 23 bits of precision.

When you use denormalized floats the exponent is 0, and thus there is no leading 1 bit. So you do lose a bit of precision. Edited by Washu

##### Share on other sites


Edited by RoundPotato

##### Share on other sites

This is normalized, form, as the most significant bit is represented by the value of the exponent, giving you 1 + 24 bits of precision.

IMSB = Implied Most Significant Bit

Ok this got really weird again. MSB is represented by the value of the exponent...

Did you mean by that, that if mantissa is NaN or 0 then Implied Most Significant Bit = 0 (and that bit is lost in terms of precision), otherwise = 1 ?

Why is it 1 + 24 = 25 bit precision now all of a sudden, should it not be 1 + 23 when exponent != 0 || Nan ?

When the exponent is 0xFF and the most significant bit of the mantissa is non-zero then the floating point value is "NaN", when the exponent is 0xFF and the most significant bit is zero then the value is infinity with the sign bit determining if its + or - infinity.

When the exponent is 0x00 then the floating point number is denormalized, and we lose the "leading 1" bit. Now, its a 1 because we're dealing with binary numbers. Essentially, when you have a non-zero exponent then the number is represented in a normalized form where the most significant 1 bit place is determined by the exponent. I.e. The same as scientific notation (IMSB). With normalized floating point numbers, since the first bit is always a 1 we actually have the mantissa as additional bits of precision in addition to the leading 1 bit. Thus we actually have 24 bits. With denomalized floating point numbers we lose that leading 1 bit, and the value of the mantissa becomes our sole source of data, giving us a reducing number of bits of precision because each additional leading 0 bit in the mantissa becomes a simple place marker. (thus when you get to the smallest floating point value of (0x0000 0001) you have 1 bit of precision; Unlike in normalized mode where the IMS 1 bit gives us a mantissa of 23 bits of precision.

Lets take a few examples:
The floating point numbers will be represented in their binary variant, not decimal.

non-zero exponent (normalized):
(-1) ^ sign * 2 ^ (exponent - 127) * 1.mantissa
0 01111111 00000000000000000000000 = 1.00000000000000000000000b:
24 bits ==>^^^^^^^^^^^^^^^^^^^^^^^     ^^^^^^^^^^^^^^^^^^^^^^^^
0123456789ABCDEF0123456

zero exponent (denormalized):
(-1) ^ sign * 2 ^ -126 * 0.mantissa
0 00000000 00000000000000000100000 = 0.0 ... 00000000000000000100000b
6 bits =====================>^^^^^                             ^^^^^
012345

Edited by Washu

##### Share on other sites

Why is it 1 + 24 = 25 bit precision now all of a sudden, should it not be 1 + 23 when exponent != 0 || Nan ?

Because it gets complicated.

(Gentle reminder that this is a For Beginners post.)

There are two easy rules to remember with floating point numbers:

1. Floating point numbers are an approximation. Don't ever count on them being exact. You get about 6 decimal digits of precision.

2. Floating point values will accumulate errors.  Adding a value 10,000 times in a row can get very different results than multiplying by 10,000 and adding it once.

All you really need to know for most modern languages is that it is an approximation that gives you approximately 6 decimal digits of precision. Everything after those six decimal digits should be considered noise.

With Nyprin's useful post, he showed that once his machine hit 16777217 the conversion was off by a digit. That is, for a seven digit number the first six decimal digits were correct, the seventh was wrong. Similarly when he hit an eight digit number the last two digits were often wrong.  The exact details of the conversion is machine dependent, sometimes it will be more precise and other times it will be less precise.

Floating point operations specify both an accuracy and a precision. Accuracy is how closely centered you are toward the ideal, precision is how tightly clustered you are toward the ideal. Each of the floating point operations specifies a relative accuracy and precision. You may or may not have learned this already, so here is one of many visual version of the difference:

Floating point operations specify a precision and an accuracy. They don't have to produce the exactly correct results that you would get from symbolic representation, they are allowed to be slightly off.

The chip manufacturers (e.g. Intel, AMD, Motorola) have always been clear that their internal algorithms change frequently. Comparing the exact results of an operation, such as a sin or cos function, can give different results even on similar chips. If you compare an Intel i7 chip that used the "Westmere" internals it could produce different results than an Intel i7 chip that used the "Clarkdale" internals. The results are different, but they are not wrong.

Calculating a sine or cosine on the different chips with the same input number may give slightly different results. They will be within the accuracy and precision requirements, but they can be different.  One chip might give 0.974882341 and another might give 0.974881532, but both numbers are within the specification. Or like the picture above, one processor might be in the upper right corner of the yellow, one processor might be in the lower left corner of the yellow, but the results are still considered correct since they are within the yellow.

Also some additional detail on how minor differences accumulate.  When you start doing math operations those tiny pieces of error start to accumulate. When you add a tiny fraction once or twice it isn't a big deal, but when you add the error together millions or billions of times it can become big enough to make results meaningless. For instance, it is often better to calculate rotation each update rather than to add a tiny difference each time.  Applying a tiny increment by 0.1% a few times will still be close to the right result, but adding 0.1% + 0.1% + 0.1% a thousand times can give a very different result than the value of exactly 1.0 that you were expecting. It is usually best to directly calculate positions along splines rather than to accumulate tiny steps. It is usually best to directly calculate rotations rather than accumulate tiny rotations. It is usually best to directly calculate distances rather than to accumulate tiny distances.

When numbers are calculated they are first manipulated so they are on an expected order of magnitude, then they are manipulated again to produce a result. They may be manipulated a third time when they are written out to memory. Simply adding a big number to a little number frequently generates an inexact result.

So if you are doing ANYTHING with floating point that needs to know the difference between 24 or 25 bits of precision, expect your results to be wrong. By the time you consider the accuracy range, the precision range, and the accumulation errors, you will not be anywhere close to 25 bits of precision.

As long as you always remember those two things, that floating point math is an approximation, and that the approximation can slowly accumulate errors as the numbers are used, that should be enough to start with.

Edited by frob

##### Share on other sites

Edited by RoundPotato

##### Share on other sites

I guess they just use int types to be conceptually simplified or similar to math

Not quite. Ints are generally used for counting.

How many are 2.03 apples?