Int, float, double?

Started by
14 comments, last by Super Llama 15 years, 5 months ago
An int is defined to be the same size as the natural word size of the proccessor, so for a 32 bit platform(x86) it has 32 bits, on x64 however it would have 64 bits. There are even examples where it has 24 bits. I have read that an int must have at least 16 bits, but I'm not sure on that one.
Advertisement
The problem is not that you can't represent any base 10 fraction in base 2. Since numbers aren't any different in different bases, of course you can represent a fraction in base 2, 3, 5, 7, 10, or base 65535. It doesn't make any difference in the world what base you use.

The problem is that a float represent a chain of binary fractions, and you have a limited number of bit to use.
1/2 1/4 1/8 1/16 1/32 ....

with 3 bits, you could only represent:
1/8 = 1/8
2/8 = 1/4
3/8 = 1/4 + 1/8
4/8 = 1/2
5/8 = 1/2 + 1/8
6/8 = 1/2 + 1/4
7/8 = 1/2 + 1/4 + 1/8

so with 24 bits you can only represent the decimal fractions
i / 2**24 | i is a positive integer

the exponent allows you to multiply that fraction by
2**j | j is an integer

As you can see, you can only represent a finite collection of numbers this way, and so it will never match exactly all the real fractions you can make.
Quote:Original post by shou4577
I know that there are some oddities, such as fractions that terminate in one base may not terminate in another (is this what you are referring to?).

Yes. For example, the number "one tenth" is
in decimal: 0.1
in binary: 0.00011001100110011001100110011001100...

Since you have to make a cut somewhere (you don't have infinite precision), the decimal number 0.1 cannot be represented exactly as a binary fraction.

The approximations for 0.1 are
0.100000001490116119384765625 for float and
0.1000000000000000055511151231257827021181583404541015625 for double.
Lol why can't they just be represented the same way as an int but with an extra byte for decimal position? XD that would make more sense but the processor wouldn't understand it as quickly.

Also reading the last post, I understand that:

1/2, 1/4, 1/8, 1/16, 1/32, 1/64, 1/128, 1/256

are all terminating fractions in binary?
like:

1/2 = 0.1
1/4 = 0.01
1/8 = 0.001
1/16 = 0.0001
1/32 = 0.00001
1/64 = 0.000001
1/128 = 0.0000001
1/256 = 0.00000001

That's because binary is base 2, and instead of 1, 10, 100, its 2, 4, 8.

Wow I never noticed that before :D

the inner workings of computers are very interesting to study...

[Edited by - Super Llama on November 13, 2008 10:02:04 AM]
It's a sofa! It's a camel! No! It's Super Llama!
Quote:
Lol why can't they just be represented the same way as an int but with an extra byte for decimal position? XD that would make more sense but the processor wouldn't understand it as quickly.

That would not solve anything though. It's a valid alternative representation for numbers with non-whole portions (often called 'fixed point,' and more typically implemented as N bits of "left of decimal" integer and M bits of "right of decimal" integer such that N + M is some natural size (16, 32, etc).

But it doesn't solve any problems without creating equivalent ones.

Historically, however, it was faster -- fixed point arithmetic is done with integers and some clever shifting and reliance on simple mathematical truisms. Since it is integer-based, it was traditionally faster than floating-point because some chips used to have very slow FPUs or no FPU at all; some chips had offboard FPUs for which it could be expensive to push and pop values onto the FPU stack, etc.
Interesting, I never knew that either XD
What problems did it cause?
It's a sofa! It's a camel! No! It's Super Llama!

This topic is closed to new replies.

Advertisement