Big Endian and Little Endian

Started by
17 comments, last by GameDev.net 18 years, 1 month ago
Yes, because endians don't actually change the bits, they just change the order of the bytes. So my example simply swaps the first and second bytes. So when a big endian machine (assuming going from little endian) reads it, it'll simply read the bytes and put them into memory the other way around (ie, swapping them back) and give the intended result. Signedness doesn't matter, as were not changing the bits, just their location in memory/disk.
[ search: google ][ programming: msdn | boost | opengl ][ languages: nihongo ]
Advertisement
Quote:Original post by Anonymous Poster
Does the right shift treat signed and unsigned differently? I thought it'd treat them the same, especially because they look exactly the same under 2's complement (i.e. there is a bijection between unsigned int's greater than INT_MAX and signed int's less than 0), which is why we use 2's complement in the first place.

There are two different right shifts on a 2s complement machine. Arithmatic right shift, which sign extends [basically copies the MSB], and logical right shift [which fills with zeros]. For positive signed values and unsigned values, the two methods are identical, because the MSB is a zero anyhow. But for negative signed values, arithmatic shifting fills with 1s while logical shifting fills with 0s. The standard guarentees that positive signed values get 0s, but, as Promit pointed out, leaves the implementation to decide how to treat negative signed values.

CM
Quote:Original post by TEUTON
Quote:Original post by Troll
The right shift will fill the upper byte with all 1's....


It varies from machine to machine.


Very true. And for completeness, to get completely technical on the topic, the machine doesn't have to use 2's complement. It could have a "hidden" 9th bit for sign. Actually, it could have a 12th bit for sign. There's no requirement that a char be 8 bits. It could be as few as 7, I believe. There is no maximum. The values don't even have to be stored in a binary format. In addition, there are 8-bit addressable, 32-bit CPUs that are neither big-Endian nor little-Endian.

To be thoroughly portable according to the standard can be very, very difficult. Of course that's because being completely portable is very, very tricky.
Quote:Original post by Troll
It could have a "hidden" 9th bit for sign. Actually, it could have a 12th bit for sign. There's no requirement that a char be 8 bits. It could be as few as 7, I believe.


The minimum is 8 in C99. I think it was the same in C89 and I'd be very surprised if C++ was different.

Quote:5.2.4.2.1 of the C99 standard
Their implementation-defined values shall be equal or greater in magnitude
(absolute value) to those shown, with the same sign.
— number of bits for smallest object that is not a bit-field (byte)
CHAR_BIT 8


And in what sense would this 9th bit be "hidden"?

Quote:
The values don't even have to be stored in a binary format.


6.2.6 would make this very difficult. I'm not even sure it's possible given:
Quote:6.2.6 of the C99 standard
Values stored in objects of type unsigned char shall be represented using a pure
binary notation.
....
Each bit that is a value bit shall have the same value as
the same bit in the [unsigned char] representation of the corresponding unsigned type.


Quote:
To be thoroughly portable according to the standard can be very, very difficult. Of course that's because being completely portable is very, very tricky.


For most problems, I disagree. Unless you count third party libraries as breaking conformance (I think they do keep a program from being "strictly conforming" by the standard), then it's usually not so hard (because those third party libraries needn't be "strictly conforming", just like the implementation of the C standard library isn't required to be "strictly conforming" (e.g. you couldn't implement much of stdio.h)).
Quote:Original post by Anonymous Poster
Quote:Original post by Troll
It could have a "hidden" 9th bit for sign. Actually, it could have a 12th bit for sign. There's no requirement that a char be 8 bits. It could be as few as 7, I believe.


The minimum is 8 in C99. I think it was the same in C89 and I'd be very surprised if C++ was different.


I wouldn't. There were many more architectures in the 80s. E-mail binary attachments are all uuencoded because some early transmission protocols only transmitted in 7-bits. That's why ASCII is limited to a 7-bit range. The PDP-7 was addressable in 18-bit chunks.

The C++ standard (as of 2003) doesn't specify the minimum number of bits in a byte but does specify that it must be enough to hold the fundamental character of the platform; therefore, for any machine that uses ASCII, 7-bits are a minimum. In theory if you wrote a C++ compiler to spit out java virtual machine bytecode, your byte should be 16-bit because that is the size of its native character, and 8-bit bytes would be unaddressable as such.


Quote:
And in what sense would this 9th bit be "hidden"?


From the standpoint that it is not used in logical operations: shifts, ands, ors, etc. It only affects the sign of arithmetic operations. Now, it wouldn't be completely hidden since it would have to present for loading and saving. But I see nothing in the standard that would require it to be part of logical operations. It could be somewhat separate like in systems that have extra parity bits. Your quote below mentions "value bit", but I haven't seen it in the C++ standard (although I might have missed it). I don't have a C89 standard, so I can't say if it's in there.

Quote:
The values don't even have to be stored in a binary format.


6.2.6 would make this very difficult. I'm not even sure it's possible given:
Quote:6.2.6 of the C99 standard
Values stored in objects of type unsigned char shall be represented using a pure
binary notation.
....
Each bit that is a value bit shall have the same value as
the same bit in the [unsigned char] representation of the corresponding unsigned type.


The current C++ Standard (2003) doesn't explicitly state that it's a pure binary notation; however, I looked it up, and it defines certain sizes in terms of "bits" which would indicate binary; therefore, it's almost certain I was wrong about the non-binary thing. Be careful about inferring what's standard in C++ from what's in C99. If it's in C99 but not C89, unless it was taken from C++, it's not in a C++ standard yet. The 8-bit byte is a perfect example. There are non-8-bit C++ compilers out there.

Quote:
Quote:
To be thoroughly portable according to the standard can be very, very difficult. Of course that's because being completely portable is very, very tricky.


For most problems, I disagree. Unless you count third party libraries as breaking conformance (I think they do keep a program from being "strictly conforming" by the standard), then it's usually not so hard (because those third party libraries needn't be "strictly conforming", just like the implementation of the C standard library isn't required to be "strictly conforming" (e.g. you couldn't implement much of stdio.h)).


No, but there's a big difference here. No C/C++ standard library has to worry about cross-platform portability. Endianness is never an issue. The number of bits in a byte is never an issue. These things only become an issue when you go cross-platform (including same hardware but different compilers). The standards describe an abstract machine and allow significant latitude in the concrete definition of that machine. It's when you deal with trying to write something that works with all the different concrete versions that you can run into a real headache. I've talked to people who have written C on 9-bit byte machines. I have a friend with a commercially-produced 36-bit machine. It happens (well, at least used to), and the standard tries to remain silent on this issues so that the languages can be used on a maximum number of platforms.
Quote:Original post by Troll
I wouldn't. There were many more architectures in the 80s. E-mail binary attachments are all uuencoded because some early transmission protocols only transmitted in 7-bits. That's why ASCII is limited to a 7-bit range. The PDP-7 was addressable in 18-bit chunks.

Of course, C++ wasn't standardized until the mid 90s, so what happened in the 80s doesn't really matter.
Quote:Original post by Troll
The C++ standard (as of 2003) doesn't specify the minimum number of bits in a byte but does specify that it must be enough to hold the fundamental character of the platform; therefore, for any machine that uses ASCII, 7-bits are a minimum. In theory if you wrote a C++ compiler to spit out java virtual machine bytecode, your byte should be 16-bit because that is the size of its native character, and 8-bit bytes would be unaddressable as such.

However, it does specify that memory model is supposed to be consistant with the C99 standard. Further, it specifies that the contents of <climits> be the same as the contents of "limits.h". Referencing the C standard is a pretty solid move in many cases. I'm neither agreeing or disagreeing with the statement that CHAR_BITS has to be at least 8, however...I don't have access to the C standard, and I don't see anything in the C++ standard that explicitly states a minimum value. For what its worth, I have heard that numerous times, however, by people that generally know what they're talking about.
Quote:Original post by Troll
The current C++ Standard (2003) doesn't explicitly state that it's a pure binary notation; however, I looked it up, and it defines certain sizes in terms of "bits" which would indicate binary; therefore, it's almost certain I was wrong about the non-binary thing. Be careful about inferring what's standard in C++ from what's in C99. If it's in C99 but not C89, unless it was taken from C++, it's not in a C++ standard yet. The 8-bit byte is a perfect example. There are non-8-bit C++ compilers out there.

Section 3.9.1.7.

That doesn't matter, however, as the actual arithmetic is defined in such a way that the format doesn't matter. Even hypothetical ternary systems have to behave as if they were binary. Shifts in particular are defined as being equivilent to multiplying/dividing by 2n [unless you are right shifting a negative value, in which case the results are implementation defined].

CM
Quote:Original post by Conner McCloud
However, it does specify that memory model is supposed to be consistant with the C99 standard. Further, it specifies that the contents of <climits> be the same as the contents of "limits.h". Referencing the C standard is a pretty solid move in many cases. I'm neither agreeing or disagreeing with the statement that CHAR_BITS has to be at least 8, however...I don't have access to the C standard, and I don't see anything in the C++ standard that explicitly states a minimum value. For what its worth, I have heard that numerous times, however, by people that generally know what they're talking about.


The original C++ standard is dated in 1998 and it clearly states that its normative reference is to the C89 standard. Sure enough, you are correct that the 2003 edition uses C99 as its standard reference. However, 17.4.1.2p4 states that the cname headers correspond to the similarly named name.h C header file as specified in the original standard. I don't have the 9899:1990 standard, it's surprisingly difficult to find such reference on the web, so I can't answer whether it makes the requirement that CHAR_BITS be at least 8.

I'm just trying to say that referring to the C99 standard does not give a definitive answer as to what the C89 standard contains. It might also contain that same minimum. I guess in the end, I shouldn't be wasting my time since whether or not its true is basically irrelevant in the modern world. I doubt there will be a viable game platform in the near future that is going to be 7-bit.
Quote:C89 final draft
2.2.4.2 Numerical limits

A conforming implementation shall document all the limits specified
in this section, which shall be specified in the headers
and .

"Sizes of integral types "

The values given below shall be replaced by constant expressions
suitable for use in #if preprocessing directives. Their
implementation-defined values shall be equal or greater in magnitude
(absolute value) to those shown, with the same sign.

* maximum number of bits for smallest object that is not a bit-field (byte)
CHAR_BIT 8

No HTML for AP, so link didn't work. Here's the link I tried to make:
http://danpop.home.cern.ch/danpop/ansi.c

This topic is closed to new replies.

Advertisement