Sign in to follow this  

Question about intergers and floating point.

This topic is 4732 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

#include <iostream>
using namespace std;

int main()
{
	int one = 1;
	int ten = 10;
	int onehundred = 100;
	int onethousand = 1000;
	int tenthousand = 10000;
	int onehundredthousand = 100000;
	int million = 1000000;
	int tenmillion = 10000000;
	int onehundredmillion = 100000000;
	int trillion = 1000000000;
	int	tentrillion = 10000000000;
	int	hundredtrillion  = 100000000000;

cout << one << endl;
cout << ten << endl;
cout << onehundred << endl;
cout << onethousand << endl;
cout << tenthousand << endl;
cout << onehundredthousand << endl;
cout << million << endl;
cout << tenmillion << endl;
cout << onehundredmillion << endl;
cout << trillion << endl;
cout << tentrillion << endl;
cout << hundredtrillion << endl;



	double f_one = 1;
	double f_ten = 10;
	double f_onehundred = 100;
	double f_onethousand = 1000;
	double f_tenthousand = 10000;
	double f_onehundredthousand = 100000;
	double f_million = 1000000;
	double f_tenmillion = 10000000;
	double f_onehundredmillion = 100000000;
	double f_trillion = 1000000000;
	double	f_tentrillion = 10000000000;
	double	f_hundredtrillion  = 100000000000;

cout << f_one << endl;
cout << f_ten << endl;
cout << f_onehundred << endl;
cout << f_onethousand << endl;
cout << f_tenthousand << endl;
cout << f_onehundredthousand << endl;
cout << f_million << endl;
cout << f_tenmillion << endl;
cout << f_onehundredmillion << endl;
cout << f_trillion << endl;
cout << f_tentrillion << endl;
cout << f_hundredtrillion << endl;

return 0;
}



What the program displays.
Quote:
1 10 100 1000 10000 100000 1000000 10000000 100000000 1000000000 1410065408 1215752192 1 10 100 1000 10000 100000 1e+006 1e+007 1e+008 1e+009 1e+010 1e+011 Press any key to continue
I can't figure it out. Why do the integers fail to display the correct numbers while the floating points still yield the correct results even though in Sci Notation? Thx. Idea I still don't grasp.

Share this post


Link to post
Share on other sites
the max value for ints is 2147483647
and unsigned ints is 4294967296

this is because ints are 4 bytes
a byte is 8 bits
and a bit is 0 or 1
2^8 = 256
256^4 = 4294967296
2147483647 = 4294967296/2-1

edit: fixed more math

[Edited by - mike25025 on December 31, 2004 1:45:44 AM]

Share this post


Link to post
Share on other sites
Signed. OH OH OH. Haha.. Negated. Negative. Gotcha.


A int is usually 4 bytes?

So its

00000000000000000000000000000000

32 1/0's?

Strung together?



I just didn't get why you raised 8^2 = 256??/

Then 256^4 to get your values. I would have thought of it as


2^0 =1
2^1 =2
2^2 =4
.....8
.....16
.....32
.......64
........128


1+2+4+8+16+32+64+128=255


Then 255^4

But that's not working for some reason. Damn.

Share this post


Link to post
Share on other sites
unsigned 8 bit holds 0 - 255
signed 8 bit holds -128 - 127
unsigned 16 bit holds 0 - 65535
signed 16 bit holds -32768 - 32767

and so on...

whenever the value is larger or smaller, it wraps around.

meaning, in unsigned 8-bit:
255 + 1 = 0
0 - 1 = 255

Share this post


Link to post
Share on other sites
listing all the combination using 3 bits

1 000
2 001
3 010
4 011
5 100
6 101
7 110
8 111

there are 8 combinations

2(0 or 1)^3(the number of bits) = 8(the number of combinations)

edit: fixed math

Share this post


Link to post
Share on other sites
Ok then why is the floating point accurate?

My math logic is wacky. I'm just stuck on that 255 vs 256 thing. Why you use 256 to get your next number and not 255.



11111111

255






Share this post


Link to post
Share on other sites
Quote:
2^0 =1
2^1 =2
2^2 =4
.....8
.....16
.....32
.......64
........128


1+2+4+8+16+32+64+128=255


this is how binary works

2^0*1 = 1*1
2^1*1 = 2*1
2^2*0 = 4*0
2^3*1 = 8*1
2^4*0 = 16*0

1*1+2*1+4*0+8*1+16*0 = 1+2+0+8+0 = 9

so 11010 = 9

edit: 255 is the max 256 is the number of possibilities

Share this post


Link to post
Share on other sites
Yeah I get that. It goes back to


Quote:

this is because ints are 4 bytes
a byte is 8 bits
and a bit is 0 or 1
8^2 = 256
256^4 = 4294967296
2147483647 = 4294967296/2-1



11111111=======255

2^8=256. Ok yeah that's the number of values. Not the total amount held. 255 is that the numeral capable of being held.

Then you raise 256^4. That's where I'm getting lost.

Woops yeah your right.

Share this post


Link to post
Share on other sites
Got it got it.


Ok 2^8 is 256. Subtract 1 to account for 2^0.


256^4. Subtract 1 to account for 2^0.

However I'm still one off.


I got (256^4)-1. Are you supposed to account for 2^0=1. Your very first bit on the right?4294967295

However you got 4294967296


Quote:

this is because ints are 4 bytes
a byte is 8 bits
and a bit is 0 or 1
2^8 = 256
256^4 = 4294967296
2147483647 = 4294967296/2-1



How's that?


Isn't that what we are trying to find? Back to my problem. The number at which the it starts over?

Share this post


Link to post
Share on other sites
Quote:
Original post by ncasebee
Got it got it.


Ok 2^8 is 256. Subtract 1 to account for 2^0.


256^4. Subtract 1 to account for 2^0.


How's that?


what you are doing there is finding the max value

Share this post


Link to post
Share on other sites
Back!

A byte can hold any value from 00000000 to 11111111.

That is 1 + 2 + 4 + 8 + 16 + 32 + 64 + 128 which is 255 + the 0, and you have 256 different states.

Signed/unsigned integers.

Ints are 16 bit numbers (or 32 bit, depending on platform).

Now in normal binary we can represent positive numbers (learn how to decode binary, its SIMPLE).But we can't use negitive numbers, What do we Do??

What we do, is implement a sign bit, which is the most significant bit (leftmost IIRC).

Now 0000000000000001 is 1, but 1000000000000001 is -1. because the leftmost bit is set. see now?

The interesting thing, is that there is + and - 0. You usually stick to +0.

Unsigned ints are ints that don't use the sign bit, so you double the number you can store in it.

Your particular problem is called overflow.

When you try to shove a kiloton of water into a 1l coke bottle, what happens?
(answer: You get very, very, very wet. you also flood quite some area, so don't try it at home).

This happens with bits also, it overflows the bits, and because it has no bits left to write it with on its own integer, it goes and overwrites the next one.

So, what your getting is whats left, or the bit which fitted into the int you were looking at.

Floating point numbers are different

They have a sign bit, some fraction bits, and a mantissa.

What it does, is it takes the binary fraction of the fraction bits (say .0010101), and multiplies that with 2^mantissa (so if the mantissa is 011, it would be 2^3 or 8).

Eg. for some bit floating point number (not the real one, i don't remember how many bits are in each)

1 0100101 00010

S F M

S = sign bit
F = Fraction
M = mantissa

Lets decode this:

The sign bits 1, so its a negitive number

The binary fraction is 0.01000101

That is

0 / 2^0 +
1 / 2^1 +
0 / 2^2 +
0 / 2^3 +
0 / 2^4 +
1 / 2^5 +
0 / 2^6 +
1 / 2^7 +

Which is

0 / 1 +
1 / 2 +
0 / 4 +
0 / 8 +
0 / 16 +
1 / 32 +
0 / 64 +
1 / 128 +

That is

0 +
0.5 +
0 +
0 +
0 +
0.03125 +
0 +
0.0078125 +

Which just happens to be 0.5390625

Now the mantissa is 00010

That is
0 * 2^0 +
0 * 2^1 +
0 * 2^2 +
0 * 2^3 +
1 * 2^4 +
0 * 2^5

Which is 16.

So now the answer is (F * (2^M)) Negitise if the sign bit is one.

this the answer, which is the binary fraction * 2^mantissa

Now 2^16 = 65536

And 0.5390625 * 54436 = 35328

And the sign bit is 0, so the answer is 35328.

Now the IEEE floating point format (which is what i've been explaining, just with much shorter numbers), is pretty simple, once you get to know it.

The downside, is that some numbers can't be represented well, And although it can produce big numbers with a relitively small number of bits. Its accurisy goes down as the mantissa goes up. (for bigger numbers).

Nice little tutorial, maybe worth saving somewhere?

From,
Nice coder

Share this post


Link to post
Share on other sites
Thx for the help guys I think I'm really close.


One last question. Why does the floating point work then?

Wow. Didn't read previous post. Maybe that holds answer. THX SO MUCH. I'll read in morning.
Im going to bed. Thx. :)

Share this post


Link to post
Share on other sites
Quote:
Original post by Nice Coder
Now in normal binary we can represent positive numbers (learn how to decode binary, its SIMPLE).But we can't use negitive numbers, What do we Do??

What we do, is implement a sign bit, which is the most significant bit (leftmost IIRC).

This is one way of doing it, but it's not a good way, because we have two ways of representing 0 (+0 and -0, 10000000 and 00000000 in 8 bits), and it's not how most processors do it. The normal way of doing it is using two's complement notation, which you can read about here.

Share this post


Link to post
Share on other sites
Well, I remember reading a book in a computer class that stated that a float is actually stored in Scientific Notation. Since I don't know much, read this post to find out more: http://www.gamedev.net/community/forums/topic.asp?topic_id=198236
As for the integer thing...
I'm gonna try to make an example:

Say you try and do this:
> BYTE bob = 549;
This is what happens:
> bob = 37
Why? The excess bits of 549 are 'cut off', so to say. Further explanation:
Value: 549 = 0010 0010 0101
As you know, a BYTE is 8 bits, yet 549 takes at least 12 bits to define. (12 bits wouldn't really be a variable though) So what does it do? Throws away the excess. What do you get?
0010 0010 0101 is now:
---- 0010 0101 (no more 12 bits :( )
So why is bob now 37? Well, take the binary and do all of the multiplication:
(2^5) + (2^2) + (2^0) = 32 + 4 + 1 = 37
Hope that kind of explains it.
(NOTE: Do this equation: 549 % 256. What do you get?)
Tell me if I'm wrong.

[Edited by - deadimp on December 31, 2004 2:05:09 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by mike25025

this is how binary works

2^0*1 = 1*1
2^1*1 = 2*1
2^2*0 = 4*0
2^3*1 = 8*1
2^4*0 = 16*0

1*1+2*1+4*0+8*1+16*0 = 1+2+0+8+0 = 9

so 11010 = 9

...

Last I checked, 1 + 2 + 8 != 9

Also, that doesn't evaluate to (11010)2, it comes out to (01011)2, which is (11)10, or (B)16.
(1001)2 == (9)10

Share this post


Link to post
Share on other sites
Quote:
Original post by Nice Coder
Now 0000000000000001 is 1, but 1000000000000001 is -1. because the leftmost bit is set. see now?

The interesting thing, is that there is + and - 0. You usually stick to +0.

I'm sorry dude but you've goofed Nice Coder...

a) -1(decimal) in binary is where all of the bits are 1, not just the first and last.
b) 2's-complement (as all PC's use) does not have the ability to store positive and negative zero.

For integers on a 32-bit targeting compiler, it goes like this:

Byte = 8 bits = 2^8 (256) combinations = 0.........255 (unsigned), or -128.........127 (signed)
short= 16 bits = 2^16 (65536) combinations = 0.......65535 (unsigned), or -32768.......32767 (signed)
int = 32 bits = 2^32 (4294967296) combinations = 0..4294967295 (unsigned), or -2147483648..2147483647 (signed)

All you need to know is that if you try and store a number outside of those ranges then it will not store the correct value because it cannot. What value it will be does end up at does not matter.


Floating point numbers are stored in a scientific notation kind of format. So they can store big numbers because they store the exponent in binary form, but you only have limited accuracy. Although 1e+50 and 1e-50 are easily represented in floating point (for example), addng them together is not going to give you an accurate result because the significant digits are too far apart.
In floating point, 1 bit is used as the sign bit.

Share this post


Link to post
Share on other sites

This topic is 4732 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this