# Question about intergers and floating point.

This topic is 4732 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

#include <iostream>
using namespace std;

int main()
{
int one = 1;
int ten = 10;
int onehundred = 100;
int onethousand = 1000;
int tenthousand = 10000;
int onehundredthousand = 100000;
int million = 1000000;
int tenmillion = 10000000;
int onehundredmillion = 100000000;
int trillion = 1000000000;
int	tentrillion = 10000000000;
int	hundredtrillion  = 100000000000;

cout << one << endl;
cout << ten << endl;
cout << onehundred << endl;
cout << onethousand << endl;
cout << tenthousand << endl;
cout << onehundredthousand << endl;
cout << million << endl;
cout << tenmillion << endl;
cout << onehundredmillion << endl;
cout << trillion << endl;
cout << tentrillion << endl;
cout << hundredtrillion << endl;

double f_one = 1;
double f_ten = 10;
double f_onehundred = 100;
double f_onethousand = 1000;
double f_tenthousand = 10000;
double f_onehundredthousand = 100000;
double f_million = 1000000;
double f_tenmillion = 10000000;
double f_onehundredmillion = 100000000;
double f_trillion = 1000000000;
double	f_tentrillion = 10000000000;
double	f_hundredtrillion  = 100000000000;

cout << f_one << endl;
cout << f_ten << endl;
cout << f_onehundred << endl;
cout << f_onethousand << endl;
cout << f_tenthousand << endl;
cout << f_onehundredthousand << endl;
cout << f_million << endl;
cout << f_tenmillion << endl;
cout << f_onehundredmillion << endl;
cout << f_trillion << endl;
cout << f_tentrillion << endl;
cout << f_hundredtrillion << endl;

return 0;
}


What the program displays.
Quote:
 1 10 100 1000 10000 100000 1000000 10000000 100000000 1000000000 1410065408 1215752192 1 10 100 1000 10000 100000 1e+006 1e+007 1e+008 1e+009 1e+010 1e+011 Press any key to continue
I can't figure it out. Why do the integers fail to display the correct numbers while the floating points still yield the correct results even though in Sci Notation? Thx. Idea I still don't grasp.

##### Share on other sites
the max value for ints is 2147483647
and unsigned ints is 4294967296

this is because ints are 4 bytes
a byte is 8 bits
and a bit is 0 or 1
2^8 = 256
256^4 = 4294967296
2147483647 = 4294967296/2-1

edit: fixed more math

[Edited by - mike25025 on December 31, 2004 1:45:44 AM]

##### Share on other sites
Uhhh. It's not limited at 2147483647 or 4294967296.

Unsigned ints?

The last value decreased in value.

1410065408
1215752192

##### Share on other sites
its decreased because when a int (or other type) goes over its max it goes back to 0 (or its min) and then continues

unsinged int are ints that cant be negative

##### Share on other sites
Ok first. Unsigned Intergers. I've never heard of this.

I thought a byte could hold 0-255? Am I wrong? 256 values but 0 has to be included.???

##### Share on other sites
a byte holds

0
1
2
3
...

...
253
254
255

255-0 = 255
255+1 = 256

the 1 is added because 0 is in the list

0
1
2
3
4

4-0 = 4
4+1 = 5

there are five numbers

##### Share on other sites
byte = 0-255 if it is unsigned, if it is signed then, -128 to 127.

##### Share on other sites
Signed. OH OH OH. Haha.. Negated. Negative. Gotcha.

A int is usually 4 bytes?

So its

00000000000000000000000000000000

32 1/0's?

Strung together?

I just didn't get why you raised 8^2 = 256??/

Then 256^4 to get your values. I would have thought of it as

2^0 =1
2^1 =2
2^2 =4
.....8
.....16
.....32
.......64
........128

1+2+4+8+16+32+64+128=255

Then 255^4

But that's not working for some reason. Damn.

##### Share on other sites
unsigned 8 bit holds 0 - 255
signed 8 bit holds -128 - 127
unsigned 16 bit holds 0 - 65535
signed 16 bit holds -32768 - 32767

and so on...

whenever the value is larger or smaller, it wraps around.

meaning, in unsigned 8-bit:
255 + 1 = 0
0 - 1 = 255

##### Share on other sites
listing all the combination using 3 bits

1 000
2 001
3 010
4 011
5 100
6 101
7 110
8 111

there are 8 combinations

2(0 or 1)^3(the number of bits) = 8(the number of combinations)

edit: fixed math

##### Share on other sites
you have it backwards.
3^2 = 9

maybe you mean 2^8 = 256?

##### Share on other sites
Ok then why is the floating point accurate?

My math logic is wacky. I'm just stuck on that 255 vs 256 thing. Why you use 256 to get your next number and not 255.

11111111

255

##### Share on other sites
Quote:
 2^0 =12^1 =22^2 =4.....8.....16.....32.......64........1281+2+4+8+16+32+64+128=255

this is how binary works

2^0*1 = 1*1
2^1*1 = 2*1
2^2*0 = 4*0
2^3*1 = 8*1
2^4*0 = 16*0

1*1+2*1+4*0+8*1+16*0 = 1+2+0+8+0 = 9

so 11010 = 9

edit: 255 is the max 256 is the number of possibilities

##### Share on other sites
Yeah I get that. It goes back to

Quote:
 this is because ints are 4 bytesa byte is 8 bitsand a bit is 0 or 18^2 = 256256^4 = 42949672962147483647 = 4294967296/2-1

11111111=======255

2^8=256. Ok yeah that's the number of values. Not the total amount held. 255 is that the numeral capable of being held.

Then you raise 256^4. That's where I'm getting lost.

##### Share on other sites
total combinations ^ number of bytes (or bits or ...)

##### Share on other sites
8^2 = 64

im sure he meant:
2^8 = 256
2^16 = 65536
2^32 = 4294967296

##### Share on other sites
Got it got it.

Ok 2^8 is 256. Subtract 1 to account for 2^0.

256^4. Subtract 1 to account for 2^0.

However I'm still one off.

I got (256^4)-1. Are you supposed to account for 2^0=1. Your very first bit on the right?4294967295

However you got 4294967296

Quote:
 this is because ints are 4 bytesa byte is 8 bitsand a bit is 0 or 12^8 = 256256^4 = 42949672962147483647 = 4294967296/2-1

How's that?

Isn't that what we are trying to find? Back to my problem. The number at which the it starts over?

##### Share on other sites
Quote:
 Original post by ncasebeeGot it got it.Ok 2^8 is 256. Subtract 1 to account for 2^0.256^4. Subtract 1 to account for 2^0.How's that?

what you are doing there is finding the max value

##### Share on other sites
Back!

A byte can hold any value from 00000000 to 11111111.

That is 1 + 2 + 4 + 8 + 16 + 32 + 64 + 128 which is 255 + the 0, and you have 256 different states.

Signed/unsigned integers.

Ints are 16 bit numbers (or 32 bit, depending on platform).

Now in normal binary we can represent positive numbers (learn how to decode binary, its SIMPLE).But we can't use negitive numbers, What do we Do??

What we do, is implement a sign bit, which is the most significant bit (leftmost IIRC).

Now 0000000000000001 is 1, but 1000000000000001 is -1. because the leftmost bit is set. see now?

The interesting thing, is that there is + and - 0. You usually stick to +0.

Unsigned ints are ints that don't use the sign bit, so you double the number you can store in it.

Your particular problem is called overflow.

When you try to shove a kiloton of water into a 1l coke bottle, what happens?
(answer: You get very, very, very wet. you also flood quite some area, so don't try it at home).

This happens with bits also, it overflows the bits, and because it has no bits left to write it with on its own integer, it goes and overwrites the next one.

So, what your getting is whats left, or the bit which fitted into the int you were looking at.

Floating point numbers are different

They have a sign bit, some fraction bits, and a mantissa.

What it does, is it takes the binary fraction of the fraction bits (say .0010101), and multiplies that with 2^mantissa (so if the mantissa is 011, it would be 2^3 or 8).

Eg. for some bit floating point number (not the real one, i don't remember how many bits are in each)

1 0100101 00010

S F M

S = sign bit
F = Fraction
M = mantissa

Lets decode this:

The sign bits 1, so its a negitive number

The binary fraction is 0.01000101

That is

0 / 2^0 +
1 / 2^1 +
0 / 2^2 +
0 / 2^3 +
0 / 2^4 +
1 / 2^5 +
0 / 2^6 +
1 / 2^7 +

Which is

0 / 1 +
1 / 2 +
0 / 4 +
0 / 8 +
0 / 16 +
1 / 32 +
0 / 64 +
1 / 128 +

That is

0 +
0.5 +
0 +
0 +
0 +
0.03125 +
0 +
0.0078125 +

Which just happens to be 0.5390625

Now the mantissa is 00010

That is
0 * 2^0 +
0 * 2^1 +
0 * 2^2 +
0 * 2^3 +
1 * 2^4 +
0 * 2^5

Which is 16.

So now the answer is (F * (2^M)) Negitise if the sign bit is one.

this the answer, which is the binary fraction * 2^mantissa

Now 2^16 = 65536

And 0.5390625 * 54436 = 35328

And the sign bit is 0, so the answer is 35328.

Now the IEEE floating point format (which is what i've been explaining, just with much shorter numbers), is pretty simple, once you get to know it.

The downside, is that some numbers can't be represented well, And although it can produce big numbers with a relitively small number of bits. Its accurisy goes down as the mantissa goes up. (for bigger numbers).

Nice little tutorial, maybe worth saving somewhere?

From,
Nice coder

##### Share on other sites
Thx for the help guys I think I'm really close.

One last question. Why does the floating point work then?

Im going to bed. Thx. :)

##### Share on other sites
Quote:
 Original post by Nice CoderNow in normal binary we can represent positive numbers (learn how to decode binary, its SIMPLE).But we can't use negitive numbers, What do we Do??What we do, is implement a sign bit, which is the most significant bit (leftmost IIRC).

This is one way of doing it, but it's not a good way, because we have two ways of representing 0 (+0 and -0, 10000000 and 00000000 in 8 bits), and it's not how most processors do it. The normal way of doing it is using two's complement notation, which you can read about here.

##### Share on other sites
Well, I remember reading a book in a computer class that stated that a float is actually stored in Scientific Notation. Since I don't know much, read this post to find out more: http://www.gamedev.net/community/forums/topic.asp?topic_id=198236
As for the integer thing...
I'm gonna try to make an example:

Say you try and do this:
> BYTE bob = 549;
This is what happens:
> bob = 37
Why? The excess bits of 549 are 'cut off', so to say. Further explanation:
Value: 549 = 0010 0010 0101
As you know, a BYTE is 8 bits, yet 549 takes at least 12 bits to define. (12 bits wouldn't really be a variable though) So what does it do? Throws away the excess. What do you get?
0010 0010 0101 is now:
---- 0010 0101 (no more 12 bits :( )
So why is bob now 37? Well, take the binary and do all of the multiplication:
(2^5) + (2^2) + (2^0) = 32 + 4 + 1 = 37
Hope that kind of explains it.
(NOTE: Do this equation: 549 % 256. What do you get?)
Tell me if I'm wrong.

[Edited by - deadimp on December 31, 2004 2:05:09 PM]

##### Share on other sites
Quote:
 Original post by mike25025this is how binary works2^0*1 = 1*12^1*1 = 2*12^2*0 = 4*02^3*1 = 8*12^4*0 = 16*01*1+2*1+4*0+8*1+16*0 = 1+2+0+8+0 = 9so 11010 = 9

...

Last I checked, 1 + 2 + 8 != 9

Also, that doesn't evaluate to (11010)2, it comes out to (01011)2, which is (11)10, or (B)16.
(1001)2 == (9)10

##### Share on other sites
it was 2:30am when i wrote that

sorry for any confusion

##### Share on other sites
Quote:
 Original post by Nice CoderNow 0000000000000001 is 1, but 1000000000000001 is -1. because the leftmost bit is set. see now?The interesting thing, is that there is + and - 0. You usually stick to +0.

I'm sorry dude but you've goofed Nice Coder...

a) -1(decimal) in binary is where all of the bits are 1, not just the first and last.
b) 2's-complement (as all PC's use) does not have the ability to store positive and negative zero.

For integers on a 32-bit targeting compiler, it goes like this:
Byte =  8 bits = 2^8         (256) combinations = 0.........255 (unsigned), or        -128.........127 (signed)short= 16 bits = 2^16      (65536) combinations = 0.......65535 (unsigned), or      -32768.......32767 (signed)int  = 32 bits = 2^32 (4294967296) combinations = 0..4294967295 (unsigned), or -2147483648..2147483647 (signed)

All you need to know is that if you try and store a number outside of those ranges then it will not store the correct value because it cannot. What value it will be does end up at does not matter.

Floating point numbers are stored in a scientific notation kind of format. So they can store big numbers because they store the exponent in binary form, but you only have limited accuracy. Although 1e+50 and 1e-50 are easily represented in floating point (for example), addng them together is not going to give you an accurate result because the significant digits are too far apart.
In floating point, 1 bit is used as the sign bit.

##### Share on other sites

This topic is 4732 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Create an account

Register a new account

• ### Forum Statistics

• Total Topics
628719
• Total Posts
2984389

• 25
• 11
• 10
• 16
• 14