Followers 0

# potatoe

## 32 posts in this topic



Edited by RoundPotato
0

##### Share on other sites

1. Yes float usage is more expensive. May not be extremely obvious in small programs, but using the correct variable type in large ones is critical.

2. You will generally use ints when youre absolutely sure the variable should only be a whole number. floats does provide larger and more precise storage but at a cost in performance. So always use integers when possible (plus some operations are easier to do with int's rather than floats)

Edited by Penanito
-2

##### Share on other sites

Sometimes when building my systems I miss seeing those dual sockets both populated by the most powerful silicon available to the general public of the time...
You can still put a bunch of Xeons with 30 threads each on a single board Washu :)
2

##### Share on other sites

2. If 32 bit floats can hold data from -3.14 * 10 ^ -38 to 3.14 * 10 ^ 38 without precision loss then why would anyone use Ints if they can only store from -2 * 10 ^ 9 to 2 * 10 ^ 9 ?

Actually, in Lua all numbers are FP32. And it's a pain in the ass, because all of a sudden you loose the ability to store a 32-bit hash or a 32-bit Unicode codepoint as a regular number. Indices for an array can not only be negative, but can also be fractions or NANs.
0

##### Share on other sites

I guess they just use int types to be conceptually simplified or similar to math

0

##### Share on other sites


Edited by RoundPotato
0

##### Share on other sites

I don't know the history or what version of Lua you're talking about, but in the version of Lua that I downloaded source for around a year ago, the base lua_Number is a double, and it is configurable.

You are probably right, I was writing from memory, and somehow thought it was single precision. I did not know, that it is configurable though, thanks for pointing that out.

If so, is there an easy way to tell how float operation speed differs from int operation speed?

Rule(s) of thumb: If you are on the cpu, float is slightly slower then int. If you are on the GPU (especially NVidia) float is faster then int. If you are having a lot of branches, which nuke your pipeline, it doesn't matter. If you are memory bandwidth bound it doesn't matter. If you are having a lot of cache misses it doesn't matter. If you are chasing pointers it doesn't matter. If you have a low ILP it probably also doesn't matter.
0

##### Share on other sites

3. So practically the actual maximum safe range without losing precision is only 2^24 right? Similarly it says that the minimum range without losing precision is range 1.175494351e-38 , which I believe is also false right? If so then what is the minimum safe range?

The two things are not actually similar, beyond the fact that they both involve precision (the number of digits that can be accurately represented).

The maximum safe range is the limit of integral precision - i.e. the point beyond which a float is incapable of representing all integral bits of the number. It's not really about loss of precision on floats, but is instead the point at which integer precision exceeds that of a float, which is to say the point beyond which precision will be lost when converting from int to float. This is because the precision of a float is constant (with one exception, but we'll get into that) and based on the number of bits allocated to the mantissa, while the precision of an int varies depending upon the magnitude of the number. (For example, an int can represent a number between 8388608 and 16777215 with 24 bits of precision, but a number between 64 and 127 with only 7 bits of precision.)

Or to put it another way, (assuming 32-bit floats) any number with a magnitude of 2^24 or greater will lose precision when converted from int to float, and conversely any number with a magnitude less than 2^23 will lose precision when convereted from float to int.

The minimum range without loss of precision (which IS 1.175494351e-38 for a standard 32-bit float) is due to the existence of denormalized numbers, and represents an actual loss of precision within the float format itself. As has been mentioned, the mantissa of a float has an implied most significant bit of 1. However, for a denormalized number, the implied most significant bit of the mantissa is instead 0. Denormalized numbers are used only for extremely small magnitudes - they allow numbers closer to zero to be represented with increasing accuracy but reduced precision. Since the implied msb is 0, the precision is determined by the the highest set bit in the mantissa (much as with ints).

Note that if there were no such thing as denormalized numbers, there would be no such thing as "minimum range without loss of precision" - floats would have a constant precision.
1

##### Share on other sites


Edited by RoundPotato
-1

##### Share on other sites

How? A random number 45 is below 2^23...
45.0f -> 45
where '->' is conversion to int. Where is the precision loss?

45.0f has 24 bits of precision, while 45 (as an integer) has only 6. Precision is lost when converting to int because the int has fewer significant figures.

As a float, 45.0 is distinct from 45.000004. As an int, it is not.

To put it another way, 45.00000000 is more precise than 45.0, even though all the extra digits are 0s.
2

##### Share on other sites

Edited by RoundPotato
0

##### Share on other sites

If so does this also apply to what I asked earlier
RoundPotato, on 04 Aug 2014 - 09:34 AM, said:
I think you meant 1.175494351e-38 is stated without having precision loss is because such a number can be defined with the same 'precision' of a float, that is 24 bits(where precision is defined as number of bits in the mantissa) is that what you were getting at?
then?

1.175494351e-38 is the minimum normalized value a 32 bit floating point number can represent (it is not the minimum value an IEEE 32 bit float can hold accurately).

0 00000001 000000000000000000000000 ==> 1.17549435E-38
0 00000000 000000000000000000000001 ==> 1.4E-45 (note, this is a denormalized float as the exponent is 0)

Edited by Washu
0

##### Share on other sites

Edited by RoundPotato
0

##### Share on other sites

1.175494351e-38 is the minimum normalized value a 32 bit floating point number can represent (it is not the minimum value an IEEE 32 bit float can hold accurately).

So 1.17549435E-38 is the minimum representable number without precision loss because it uses 24 bits(ala 24bit precision) and 1.4E-45 is the very minimum number that can be represented but at the loss of precision(1 bit that is the MSB because it is denormalized now), that it?

With IEEE floats you have an invisible leading 1 whenever the exponent is not zero or NaN. In other words its something like (-1) ^ sign * 2 ^ exponent * 1.mantissa. This is normalized, form, as the most significant bit is represented by the value of the exponent, giving you 1 + 23 bits of precision.

When you use denormalized floats the exponent is 0, and thus there is no leading 1 bit. So you do lose a bit of precision. Edited by Washu
0

##### Share on other sites


Edited by RoundPotato
0

##### Share on other sites

This is normalized, form, as the most significant bit is represented by the value of the exponent, giving you 1 + 24 bits of precision.

IMSB = Implied Most Significant Bit

Ok this got really weird again. MSB is represented by the value of the exponent...

Did you mean by that, that if mantissa is NaN or 0 then Implied Most Significant Bit = 0 (and that bit is lost in terms of precision), otherwise = 1 ?

Why is it 1 + 24 = 25 bit precision now all of a sudden, should it not be 1 + 23 when exponent != 0 || Nan ?

When the exponent is 0xFF and the most significant bit of the mantissa is non-zero then the floating point value is "NaN", when the exponent is 0xFF and the most significant bit is zero then the value is infinity with the sign bit determining if its + or - infinity.

When the exponent is 0x00 then the floating point number is denormalized, and we lose the "leading 1" bit. Now, its a 1 because we're dealing with binary numbers. Essentially, when you have a non-zero exponent then the number is represented in a normalized form where the most significant 1 bit place is determined by the exponent. I.e. The same as scientific notation (IMSB). With normalized floating point numbers, since the first bit is always a 1 we actually have the mantissa as additional bits of precision in addition to the leading 1 bit. Thus we actually have 24 bits. With denomalized floating point numbers we lose that leading 1 bit, and the value of the mantissa becomes our sole source of data, giving us a reducing number of bits of precision because each additional leading 0 bit in the mantissa becomes a simple place marker. (thus when you get to the smallest floating point value of (0x0000 0001) you have 1 bit of precision; Unlike in normalized mode where the IMS 1 bit gives us a mantissa of 23 bits of precision.

Lets take a few examples:
The floating point numbers will be represented in their binary variant, not decimal.

non-zero exponent (normalized):
(-1) ^ sign * 2 ^ (exponent - 127) * 1.mantissa
0 01111111 00000000000000000000000 = 1.00000000000000000000000b:
24 bits ==>^^^^^^^^^^^^^^^^^^^^^^^     ^^^^^^^^^^^^^^^^^^^^^^^^
0123456789ABCDEF0123456

zero exponent (denormalized):
(-1) ^ sign * 2 ^ -126 * 0.mantissa
0 00000000 00000000000000000100000 = 0.0 ... 00000000000000000100000b
6 bits =====================>^^^^^                             ^^^^^
012345

Edited by Washu
1

##### Share on other sites

Edited by RoundPotato
2

##### Share on other sites

I guess they just use int types to be conceptually simplified or similar to math

Not quite. Ints are generally used for counting.

How many are 2.03 apples?

0

## Create an account

Register a new account