# Big Endian and Little Endian

## Recommended Posts

TEUTON    100
I was trying to swap big-endian and little endian... and wrote this code...I guess this is sufficient...but this is for unsigned types...can you suggest some code for signed types??
inline void endian_swap(unsigned short& x)
{
x = (x>>8) | (x<<8);
}


##### Share on other sites
Guest Anonymous Poster
Not entirely positive, but assuming 2's complement, signedness shouldn't make a difference.

##### Share on other sites
Nypyren    12074
For little endian anyway, where the first bytes in memory are lower significance, the last byte has the sign bit (for each integer data type and IEEE float type this is true, anyway).

I'm guessing that in big endian, the sign bit is still in the "most significant byte" (which is the first on in that case).

It's probably possible to think of endianness as being "lower level" than signedness.

As for code, you can probably do some clever typecasting to force the compiler to switch signedness. Also, if you're ever manually converting a signed value from a smaller to larger value, you might have to sign-extend it.

(Sign extension means that if the sign bit is 1, fill the 'new' bytes in the larger type with 1s to keep the same value)

##### Share on other sites
Troll    246
Quote:
 Original post by Anonymous PosterNot entirely positive, but assuming 2's complement, signedness shouldn't make a difference.

In 2's complement signednes makes a big difference. The OP's code will not work properly with a signed numeric type. The right shift will fill the upper byte with all 1's which will result in incorrect values whenever the lower byte isn't 0xFF. To do an endian swap with a signed type just cast it to unsigned, swap it, and cast it back. The casting can be handled implicitly, but some compilers will spit out warnings if you don't explicitly cast.

##### Share on other sites
TEUTON    100
Quote:
 Original post by TrollThe right shift will fill the upper byte with all 1's....

It varies from machine to machine.

##### Share on other sites
_goat    804
In which case it may be more prudent to do something like this:

inline void endian_swap(unsigned short& x){    char * k = reinterpret_cast<char*>(&x);    std::swap(k[0], k[1]);}

Untested, seems right.

##### Share on other sites
Guest Anonymous Poster
Quote:
Original post by Troll
Quote:
 Original post by Anonymous PosterNot entirely positive, but assuming 2's complement, signedness shouldn't make a difference.

In 2's complement signednes makes a big difference. The OP's code will not work properly with a signed numeric type. The right shift will fill the upper byte with all 1's which will result in incorrect values whenever the lower byte isn't 0xFF. To do an endian swap with a signed type just cast it to unsigned, swap it, and cast it back. The casting can be handled implicitly, but some compilers will spit out warnings if you don't explicitly cast.

Does the right shift treat signed and unsigned differently? I thought it'd treat them the same, especially because they look exactly the same under 2's complement (i.e. there is a bijection between unsigned int's greater than INT_MAX and signed int's less than 0), which is why we use 2's complement in the first place.

##### Share on other sites
TEUTON    100
Quote:
 Original post by _goatIn which case it may be more prudent to do something like this:*** Source Snippet Removed ***Untested, seems right.

But..you haven't casted it to unsigned type...or have you??

##### Share on other sites
Promit    13246
I know from experience that different compilers, platforms, etc. will behave differently with right shifts on signed integers. Some will sign extend, some will shift in zeroes. The standard says the behavior is implementation defined. In other words, don't rely on it.

##### Share on other sites
TEUTON    100
Is the above solution provided by _goat valid for signed numbers?

##### Share on other sites
_goat    804
Yes, because endians don't actually change the bits, they just change the order of the bytes. So my example simply swaps the first and second bytes. So when a big endian machine (assuming going from little endian) reads it, it'll simply read the bytes and put them into memory the other way around (ie, swapping them back) and give the intended result. Signedness doesn't matter, as were not changing the bits, just their location in memory/disk.

##### Share on other sites
Conner McCloud    1135
Quote:
 Original post by Anonymous PosterDoes the right shift treat signed and unsigned differently? I thought it'd treat them the same, especially because they look exactly the same under 2's complement (i.e. there is a bijection between unsigned int's greater than INT_MAX and signed int's less than 0), which is why we use 2's complement in the first place.

There are two different right shifts on a 2s complement machine. Arithmatic right shift, which sign extends [basically copies the MSB], and logical right shift [which fills with zeros]. For positive signed values and unsigned values, the two methods are identical, because the MSB is a zero anyhow. But for negative signed values, arithmatic shifting fills with 1s while logical shifting fills with 0s. The standard guarentees that positive signed values get 0s, but, as Promit pointed out, leaves the implementation to decide how to treat negative signed values.

CM

##### Share on other sites
Troll    246
Quote:
Original post by TEUTON
Quote:
 Original post by TrollThe right shift will fill the upper byte with all 1's....

It varies from machine to machine.

Very true. And for completeness, to get completely technical on the topic, the machine doesn't have to use 2's complement. It could have a "hidden" 9th bit for sign. Actually, it could have a 12th bit for sign. There's no requirement that a char be 8 bits. It could be as few as 7, I believe. There is no maximum. The values don't even have to be stored in a binary format. In addition, there are 8-bit addressable, 32-bit CPUs that are neither big-Endian nor little-Endian.

To be thoroughly portable according to the standard can be very, very difficult. Of course that's because being completely portable is very, very tricky.

##### Share on other sites
Guest Anonymous Poster
Quote:
 Original post by TrollIt could have a "hidden" 9th bit for sign. Actually, it could have a 12th bit for sign. There's no requirement that a char be 8 bits. It could be as few as 7, I believe.

The minimum is 8 in C99. I think it was the same in C89 and I'd be very surprised if C++ was different.

Quote:
 5.2.4.2.1 of the C99 standardTheir implementation-defined values shall be equal or greater in magnitude(absolute value) to those shown, with the same sign.— number of bits for smallest object that is not a bit-field (byte)CHAR_BIT 8

And in what sense would this 9th bit be "hidden"?

Quote:
 The values don't even have to be stored in a binary format.

6.2.6 would make this very difficult. I'm not even sure it's possible given:
Quote:
 6.2.6 of the C99 standardValues stored in objects of type unsigned char shall be represented using a purebinary notation.....Each bit that is a value bit shall have the same value asthe same bit in the [unsigned char] representation of the corresponding unsigned type.

Quote:
 To be thoroughly portable according to the standard can be very, very difficult. Of course that's because being completely portable is very, very tricky.

For most problems, I disagree. Unless you count third party libraries as breaking conformance (I think they do keep a program from being "strictly conforming" by the standard), then it's usually not so hard (because those third party libraries needn't be "strictly conforming", just like the implementation of the C standard library isn't required to be "strictly conforming" (e.g. you couldn't implement much of stdio.h)).

##### Share on other sites
Troll    246
Quote:
Original post by Anonymous Poster
Quote:
 Original post by TrollIt could have a "hidden" 9th bit for sign. Actually, it could have a 12th bit for sign. There's no requirement that a char be 8 bits. It could be as few as 7, I believe.

The minimum is 8 in C99. I think it was the same in C89 and I'd be very surprised if C++ was different.

I wouldn't. There were many more architectures in the 80s. E-mail binary attachments are all uuencoded because some early transmission protocols only transmitted in 7-bits. That's why ASCII is limited to a 7-bit range. The PDP-7 was addressable in 18-bit chunks.

The C++ standard (as of 2003) doesn't specify the minimum number of bits in a byte but does specify that it must be enough to hold the fundamental character of the platform; therefore, for any machine that uses ASCII, 7-bits are a minimum. In theory if you wrote a C++ compiler to spit out java virtual machine bytecode, your byte should be 16-bit because that is the size of its native character, and 8-bit bytes would be unaddressable as such.

Quote:
 And in what sense would this 9th bit be "hidden"?

From the standpoint that it is not used in logical operations: shifts, ands, ors, etc. It only affects the sign of arithmetic operations. Now, it wouldn't be completely hidden since it would have to present for loading and saving. But I see nothing in the standard that would require it to be part of logical operations. It could be somewhat separate like in systems that have extra parity bits. Your quote below mentions "value bit", but I haven't seen it in the C++ standard (although I might have missed it). I don't have a C89 standard, so I can't say if it's in there.

Quote:
 The values don't even have to be stored in a binary format.

6.2.6 would make this very difficult. I'm not even sure it's possible given:
Quote:
 6.2.6 of the C99 standardValues stored in objects of type unsigned char shall be represented using a purebinary notation.....Each bit that is a value bit shall have the same value asthe same bit in the [unsigned char] representation of the corresponding unsigned type.

The current C++ Standard (2003) doesn't explicitly state that it's a pure binary notation; however, I looked it up, and it defines certain sizes in terms of "bits" which would indicate binary; therefore, it's almost certain I was wrong about the non-binary thing. Be careful about inferring what's standard in C++ from what's in C99. If it's in C99 but not C89, unless it was taken from C++, it's not in a C++ standard yet. The 8-bit byte is a perfect example. There are non-8-bit C++ compilers out there.

Quote:

Quote:
 To be thoroughly portable according to the standard can be very, very difficult. Of course that's because being completely portable is very, very tricky.

For most problems, I disagree. Unless you count third party libraries as breaking conformance (I think they do keep a program from being "strictly conforming" by the standard), then it's usually not so hard (because those third party libraries needn't be "strictly conforming", just like the implementation of the C standard library isn't required to be "strictly conforming" (e.g. you couldn't implement much of stdio.h)).

No, but there's a big difference here. No C/C++ standard library has to worry about cross-platform portability. Endianness is never an issue. The number of bits in a byte is never an issue. These things only become an issue when you go cross-platform (including same hardware but different compilers). The standards describe an abstract machine and allow significant latitude in the concrete definition of that machine. It's when you deal with trying to write something that works with all the different concrete versions that you can run into a real headache. I've talked to people who have written C on 9-bit byte machines. I have a friend with a commercially-produced 36-bit machine. It happens (well, at least used to), and the standard tries to remain silent on this issues so that the languages can be used on a maximum number of platforms.

##### Share on other sites
Conner McCloud    1135
Quote:
 Original post by TrollI wouldn't. There were many more architectures in the 80s. E-mail binary attachments are all uuencoded because some early transmission protocols only transmitted in 7-bits. That's why ASCII is limited to a 7-bit range. The PDP-7 was addressable in 18-bit chunks.

Of course, C++ wasn't standardized until the mid 90s, so what happened in the 80s doesn't really matter.
Quote:
 Original post by TrollThe C++ standard (as of 2003) doesn't specify the minimum number of bits in a byte but does specify that it must be enough to hold the fundamental character of the platform; therefore, for any machine that uses ASCII, 7-bits are a minimum. In theory if you wrote a C++ compiler to spit out java virtual machine bytecode, your byte should be 16-bit because that is the size of its native character, and 8-bit bytes would be unaddressable as such.

However, it does specify that memory model is supposed to be consistant with the C99 standard. Further, it specifies that the contents of <climits> be the same as the contents of "limits.h". Referencing the C standard is a pretty solid move in many cases. I'm neither agreeing or disagreeing with the statement that CHAR_BITS has to be at least 8, however...I don't have access to the C standard, and I don't see anything in the C++ standard that explicitly states a minimum value. For what its worth, I have heard that numerous times, however, by people that generally know what they're talking about.
Quote:
 Original post by TrollThe current C++ Standard (2003) doesn't explicitly state that it's a pure binary notation; however, I looked it up, and it defines certain sizes in terms of "bits" which would indicate binary; therefore, it's almost certain I was wrong about the non-binary thing. Be careful about inferring what's standard in C++ from what's in C99. If it's in C99 but not C89, unless it was taken from C++, it's not in a C++ standard yet. The 8-bit byte is a perfect example. There are non-8-bit C++ compilers out there.

Section 3.9.1.7.

That doesn't matter, however, as the actual arithmetic is defined in such a way that the format doesn't matter. Even hypothetical ternary systems have to behave as if they were binary. Shifts in particular are defined as being equivilent to multiplying/dividing by 2n [unless you are right shifting a negative value, in which case the results are implementation defined].

CM

##### Share on other sites
Troll    246
Quote:
 Original post by Conner McCloudHowever, it does specify that memory model is supposed to be consistant with the C99 standard. Further, it specifies that the contents of be the same as the contents of "limits.h". Referencing the C standard is a pretty solid move in many cases. I'm neither agreeing or disagreeing with the statement that CHAR_BITS has to be at least 8, however...I don't have access to the C standard, and I don't see anything in the C++ standard that explicitly states a minimum value. For what its worth, I have heard that numerous times, however, by people that generally know what they're talking about.

The original C++ standard is dated in 1998 and it clearly states that its normative reference is to the C89 standard. Sure enough, you are correct that the 2003 edition uses C99 as its standard reference. However, 17.4.1.2p4 states that the cname headers correspond to the similarly named name.h C header file as specified in the original standard. I don't have the 9899:1990 standard, it's surprisingly difficult to find such reference on the web, so I can't answer whether it makes the requirement that CHAR_BITS be at least 8.

I'm just trying to say that referring to the C99 standard does not give a definitive answer as to what the C89 standard contains. It might also contain that same minimum. I guess in the end, I shouldn't be wasting my time since whether or not its true is basically irrelevant in the modern world. I doubt there will be a viable game platform in the near future that is going to be 7-bit.

##### Share on other sites
Guest Anonymous Poster
Quote:
 C89 final draft2.2.4.2 Numerical limits A conforming implementation shall document all the limits specifiedin this section, which shall be specified in the headers and ."Sizes of integral types " The values given below shall be replaced by constant expressionssuitable for use in #if preprocessing directives. Theirimplementation-defined values shall be equal or greater in magnitude(absolute value) to those shown, with the same sign. * maximum number of bits for smallest object that is not a bit-field (byte) CHAR_BIT 8

##### Share on other sites
Guest Anonymous Poster
No HTML for AP, so link didn't work. Here's the link I tried to make:
http://danpop.home.cern.ch/danpop/ansi.c