signed and unsigned numbers :)!

Started by
6 comments, last by frob 12 years, 9 months ago
on x86 processors....signed numbers are encoded using 2's complement correct? what encoding scheme do unsigned numbers use?
Advertisement
Unsigned numbers don't need any particular encoding. They're just there. Unless you referring to endianess.
I trust exceptions about as far as I can throw them.
Along with Story's answer, positive signed numbers also don't use any particular encoding. It's the negative values that use 2's complement.
You should also keep in mind when programming that everything is hardware specific.

The compiler does its magic so you don't need to know these things.

If you are programming at a machine-code level, such as writing pure assembly, then the questions are fine.... But then again, if you were writing that you wouldn't be posting in "For Beginners".


Your questions are things the compiler deals with. These are not things the programmer normally deals with. The programmer should deal with int and long and short and float and double and char and other types, which the compiler understands. The compiler can transform them to run on big-endian, little-endian, and mixed-endian machines. The compiler can transform the code to run on two's compliment, one's compliment, or sign-and-magnitude format (like floating point) processing. The compiler can make it run with BCD or other encoding.

That's the compiler's job to know the underlying representations and binary layout. The programmer generally should not interfere at a lower level without very good reason.



A properly written C or C++ or C# or Java or Python or other language program will run just fine on any machine without any knowledge of the underlying architecture.

If your code needs to know any of that information, then it is almost certainly a bug. If you use it in code, it is just a fancy way of shooting yourself in the foot, or embedding a time-bomb to cause you grief later.




For most processors including the x86 family, the number itself is simply data. The compiler itself decides which operation to run. The processor itself does not know or care if the value in a register is signed or unsigned; the x86 processor makes no distinction. It is simply a value in a register. The compiler generates CPU instructions that correspond to the signed and unsigned operations.

For example, it is the COMPILER, not the PROCESSOR, that decides if division should use the DIV instruction (unsigned division) or the IDIV instruction (signed division). The processor is capable of doing either. It is the compiler makes the distinction by deciding which instructions to emit.

Along with Story's answer,[s] positive signed numbers also don't use any particular encoding[/s]. It's the negative values that use 2's complement.

One needs to know the signed or unsigned encoding, which includes the length of the number, in order to interpret the number. That means a positive signed number does have a particular encoding.

So yes, it has an encoding. That encoding is still 2's complement of a specific length. You must know the encoding and interpret bits accordingly. You must know the number of bits involved, and the state of the highest bit in the set.

A register could contain the value 0x000000ff which may be interpreted just as easily as +255 as it could be -1. A register of 0x0000ffff could just as easily be +65535 or -1. A register of 0xffffffff could be interpreted as 4294967295 or -1. You absolutely need to know if your two's complement encoding is working in an 8-bit, 16-bit, 32-bit, 64-bit, or other length number system.

If you attempt to add two positive numbers together the result can be either a numeric overflow or a valid result based entirely on the encoding, even though the resulting bit patterns are the same.

[quote name='j-locke' timestamp='1311562656' post='4839821']
Along with Story's answer,[s] positive signed numbers also don't use any particular encoding[/s]. It's the negative values that use 2's complement.

One needs to know the signed or unsigned encoding, which includes the length of the number, in order to interpret the number. That means a positive signed number does have a particular encoding.

So yes, it has an encoding. That encoding is still 2's complement of a specific length. You must know the encoding and interpret bits accordingly. You must know the number of bits involved, and the state of the highest bit in the set.

A register could contain the value 0x000000ff which may be interpreted just as easily as +255 as it could be -1. A register of 0x0000ffff could just as easily be +65535 or -1. A register of 0xffffffff could be interpreted as 4294967295 or -1. You absolutely need to know if your two's complement encoding is working in an 8-bit, 16-bit, 32-bit, 64-bit, or other length number system.

If you attempt to add two positive numbers together the result can be either a numeric overflow or a valid result based entirely on the encoding, even though the resulting bit patterns are the same.
[/quote]

Thank you for the explanation including the consideration of length. When I read the reply before it mentioned that, I still wasn't seeing where confusion of a value would come from. With no length known, I do see how the values leave open multiple interpretations.
Unsigned values have an encoding...

It's called "binary" :wink:
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms

Unsigned values have an encoding...

It's called "binary" :wink:

There are still encodings necessary at that level. Is it big endian? Little endian? Middle or mixed endian? Bit endian? Is it split among registers, or swapped at word boundaries?


I can immediately think of seven different encodings for the 64-bit value 0x0123456789ABCDEF. There are at least 4 encodings of that on the x86 architecture that I can think of; it is not stored in memory as it looks right there, has different encodings in memory and in register, and can be held in one, two, or four registers depending on which way the compiler wants to do it.

Just a few weeks ago at work I was tripped over one of the ARM processor's alternate middle-endian encodings of numbers for the game I was working on. At first both myself and another senior engineer thought it may be a compiler bug in the recently-updated compiler. It turns out the compiler was using the new C++0x rules and silently promoting values to 128-bit integers and then using an alternate middle-endian format to store the results.

It made for some very confusing disassembled code to watch as 32-bit values were promoted to 64-bit values, and those later promoted to bizarrely-encoded 128-bit values, only to watch it truncate back down to 32-bits at the end. It ended up with a team-wide warning about how the current C++ standard will only silently promote to 32 bits and then generate a warning or error, but C++0x will happily (and without error) promote values to 64-bit, 128-bit, or any other integer type supported by the system.

To share that warning: Do not use enums as magic numbers that are interchangeable with your own non-enum values. They may be silently promoted into very unexpected numbers.

This topic is closed to new replies.

Advertisement