Sign in to follow this  
TEUTON

Big Endian and Little Endian

Recommended Posts

I was trying to swap big-endian and little endian... and wrote this code...I guess this is sufficient...but this is for unsigned types...can you suggest some code for signed types??
inline void endian_swap(unsigned short& x)
{
    x = (x>>8) | (x<<8);
}

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Not entirely positive, but assuming 2's complement, signedness shouldn't make a difference.

Share this post


Link to post
Share on other sites
For little endian anyway, where the first bytes in memory are lower significance, the last byte has the sign bit (for each integer data type and IEEE float type this is true, anyway).

I'm guessing that in big endian, the sign bit is still in the "most significant byte" (which is the first on in that case).

It's probably possible to think of endianness as being "lower level" than signedness.


As for code, you can probably do some clever typecasting to force the compiler to switch signedness. Also, if you're ever manually converting a signed value from a smaller to larger value, you might have to sign-extend it.

(Sign extension means that if the sign bit is 1, fill the 'new' bytes in the larger type with 1s to keep the same value)

Share this post


Link to post
Share on other sites
Quote:
Original post by Anonymous Poster
Not entirely positive, but assuming 2's complement, signedness shouldn't make a difference.


In 2's complement signednes makes a big difference. The OP's code will not work properly with a signed numeric type. The right shift will fill the upper byte with all 1's which will result in incorrect values whenever the lower byte isn't 0xFF. To do an endian swap with a signed type just cast it to unsigned, swap it, and cast it back. The casting can be handled implicitly, but some compilers will spit out warnings if you don't explicitly cast.

Share this post


Link to post
Share on other sites
In which case it may be more prudent to do something like this:


inline void endian_swap(unsigned short& x)
{
char * k = reinterpret_cast<char*>(&x);
std::swap(k[0], k[1]);
}



Untested, seems right.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by Troll
Quote:
Original post by Anonymous Poster
Not entirely positive, but assuming 2's complement, signedness shouldn't make a difference.


In 2's complement signednes makes a big difference. The OP's code will not work properly with a signed numeric type. The right shift will fill the upper byte with all 1's which will result in incorrect values whenever the lower byte isn't 0xFF. To do an endian swap with a signed type just cast it to unsigned, swap it, and cast it back. The casting can be handled implicitly, but some compilers will spit out warnings if you don't explicitly cast.


Does the right shift treat signed and unsigned differently? I thought it'd treat them the same, especially because they look exactly the same under 2's complement (i.e. there is a bijection between unsigned int's greater than INT_MAX and signed int's less than 0), which is why we use 2's complement in the first place.

Share this post


Link to post
Share on other sites
Quote:
Original post by _goat
In which case it may be more prudent to do something like this:

*** Source Snippet Removed ***

Untested, seems right.


But..you haven't casted it to unsigned type...or have you??

Share this post


Link to post
Share on other sites
I know from experience that different compilers, platforms, etc. will behave differently with right shifts on signed integers. Some will sign extend, some will shift in zeroes. The standard says the behavior is implementation defined. In other words, don't rely on it.

Share this post


Link to post
Share on other sites
Yes, because endians don't actually change the bits, they just change the order of the bytes. So my example simply swaps the first and second bytes. So when a big endian machine (assuming going from little endian) reads it, it'll simply read the bytes and put them into memory the other way around (ie, swapping them back) and give the intended result. Signedness doesn't matter, as were not changing the bits, just their location in memory/disk.

Share this post


Link to post
Share on other sites
Quote:
Original post by Anonymous Poster
Does the right shift treat signed and unsigned differently? I thought it'd treat them the same, especially because they look exactly the same under 2's complement (i.e. there is a bijection between unsigned int's greater than INT_MAX and signed int's less than 0), which is why we use 2's complement in the first place.

There are two different right shifts on a 2s complement machine. Arithmatic right shift, which sign extends [basically copies the MSB], and logical right shift [which fills with zeros]. For positive signed values and unsigned values, the two methods are identical, because the MSB is a zero anyhow. But for negative signed values, arithmatic shifting fills with 1s while logical shifting fills with 0s. The standard guarentees that positive signed values get 0s, but, as Promit pointed out, leaves the implementation to decide how to treat negative signed values.

CM

Share this post


Link to post
Share on other sites
Quote:
Original post by TEUTON
Quote:
Original post by Troll
The right shift will fill the upper byte with all 1's....


It varies from machine to machine.


Very true. And for completeness, to get completely technical on the topic, the machine doesn't have to use 2's complement. It could have a "hidden" 9th bit for sign. Actually, it could have a 12th bit for sign. There's no requirement that a char be 8 bits. It could be as few as 7, I believe. There is no maximum. The values don't even have to be stored in a binary format. In addition, there are 8-bit addressable, 32-bit CPUs that are neither big-Endian nor little-Endian.

To be thoroughly portable according to the standard can be very, very difficult. Of course that's because being completely portable is very, very tricky.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by Troll
It could have a "hidden" 9th bit for sign. Actually, it could have a 12th bit for sign. There's no requirement that a char be 8 bits. It could be as few as 7, I believe.


The minimum is 8 in C99. I think it was the same in C89 and I'd be very surprised if C++ was different.

Quote:
5.2.4.2.1 of the C99 standard
Their implementation-defined values shall be equal or greater in magnitude
(absolute value) to those shown, with the same sign.
— number of bits for smallest object that is not a bit-field (byte)
CHAR_BIT 8


And in what sense would this 9th bit be "hidden"?

Quote:

The values don't even have to be stored in a binary format.


6.2.6 would make this very difficult. I'm not even sure it's possible given:
Quote:
6.2.6 of the C99 standard
Values stored in objects of type unsigned char shall be represented using a pure
binary notation.
....
Each bit that is a value bit shall have the same value as
the same bit in the [unsigned char] representation of the corresponding unsigned type.


Quote:

To be thoroughly portable according to the standard can be very, very difficult. Of course that's because being completely portable is very, very tricky.


For most problems, I disagree. Unless you count third party libraries as breaking conformance (I think they do keep a program from being "strictly conforming" by the standard), then it's usually not so hard (because those third party libraries needn't be "strictly conforming", just like the implementation of the C standard library isn't required to be "strictly conforming" (e.g. you couldn't implement much of stdio.h)).

Share this post


Link to post
Share on other sites
Quote:
Original post by Anonymous Poster
Quote:
Original post by Troll
It could have a "hidden" 9th bit for sign. Actually, it could have a 12th bit for sign. There's no requirement that a char be 8 bits. It could be as few as 7, I believe.


The minimum is 8 in C99. I think it was the same in C89 and I'd be very surprised if C++ was different.


I wouldn't. There were many more architectures in the 80s. E-mail binary attachments are all uuencoded because some early transmission protocols only transmitted in 7-bits. That's why ASCII is limited to a 7-bit range. The PDP-7 was addressable in 18-bit chunks.

The C++ standard (as of 2003) doesn't specify the minimum number of bits in a byte but does specify that it must be enough to hold the fundamental character of the platform; therefore, for any machine that uses ASCII, 7-bits are a minimum. In theory if you wrote a C++ compiler to spit out java virtual machine bytecode, your byte should be 16-bit because that is the size of its native character, and 8-bit bytes would be unaddressable as such.


Quote:

And in what sense would this 9th bit be "hidden"?


From the standpoint that it is not used in logical operations: shifts, ands, ors, etc. It only affects the sign of arithmetic operations. Now, it wouldn't be completely hidden since it would have to present for loading and saving. But I see nothing in the standard that would require it to be part of logical operations. It could be somewhat separate like in systems that have extra parity bits. Your quote below mentions "value bit", but I haven't seen it in the C++ standard (although I might have missed it). I don't have a C89 standard, so I can't say if it's in there.

Quote:

The values don't even have to be stored in a binary format.


6.2.6 would make this very difficult. I'm not even sure it's possible given:
Quote:
6.2.6 of the C99 standard
Values stored in objects of type unsigned char shall be represented using a pure
binary notation.
....
Each bit that is a value bit shall have the same value as
the same bit in the [unsigned char] representation of the corresponding unsigned type.


The current C++ Standard (2003) doesn't explicitly state that it's a pure binary notation; however, I looked it up, and it defines certain sizes in terms of "bits" which would indicate binary; therefore, it's almost certain I was wrong about the non-binary thing. Be careful about inferring what's standard in C++ from what's in C99. If it's in C99 but not C89, unless it was taken from C++, it's not in a C++ standard yet. The 8-bit byte is a perfect example. There are non-8-bit C++ compilers out there.

Quote:

Quote:

To be thoroughly portable according to the standard can be very, very difficult. Of course that's because being completely portable is very, very tricky.


For most problems, I disagree. Unless you count third party libraries as breaking conformance (I think they do keep a program from being "strictly conforming" by the standard), then it's usually not so hard (because those third party libraries needn't be "strictly conforming", just like the implementation of the C standard library isn't required to be "strictly conforming" (e.g. you couldn't implement much of stdio.h)).


No, but there's a big difference here. No C/C++ standard library has to worry about cross-platform portability. Endianness is never an issue. The number of bits in a byte is never an issue. These things only become an issue when you go cross-platform (including same hardware but different compilers). The standards describe an abstract machine and allow significant latitude in the concrete definition of that machine. It's when you deal with trying to write something that works with all the different concrete versions that you can run into a real headache. I've talked to people who have written C on 9-bit byte machines. I have a friend with a commercially-produced 36-bit machine. It happens (well, at least used to), and the standard tries to remain silent on this issues so that the languages can be used on a maximum number of platforms.

Share this post


Link to post
Share on other sites
Quote:
Original post by Troll
I wouldn't. There were many more architectures in the 80s. E-mail binary attachments are all uuencoded because some early transmission protocols only transmitted in 7-bits. That's why ASCII is limited to a 7-bit range. The PDP-7 was addressable in 18-bit chunks.

Of course, C++ wasn't standardized until the mid 90s, so what happened in the 80s doesn't really matter.
Quote:
Original post by Troll
The C++ standard (as of 2003) doesn't specify the minimum number of bits in a byte but does specify that it must be enough to hold the fundamental character of the platform; therefore, for any machine that uses ASCII, 7-bits are a minimum. In theory if you wrote a C++ compiler to spit out java virtual machine bytecode, your byte should be 16-bit because that is the size of its native character, and 8-bit bytes would be unaddressable as such.

However, it does specify that memory model is supposed to be consistant with the C99 standard. Further, it specifies that the contents of <climits> be the same as the contents of "limits.h". Referencing the C standard is a pretty solid move in many cases. I'm neither agreeing or disagreeing with the statement that CHAR_BITS has to be at least 8, however...I don't have access to the C standard, and I don't see anything in the C++ standard that explicitly states a minimum value. For what its worth, I have heard that numerous times, however, by people that generally know what they're talking about.
Quote:
Original post by Troll
The current C++ Standard (2003) doesn't explicitly state that it's a pure binary notation; however, I looked it up, and it defines certain sizes in terms of "bits" which would indicate binary; therefore, it's almost certain I was wrong about the non-binary thing. Be careful about inferring what's standard in C++ from what's in C99. If it's in C99 but not C89, unless it was taken from C++, it's not in a C++ standard yet. The 8-bit byte is a perfect example. There are non-8-bit C++ compilers out there.

Section 3.9.1.7.

That doesn't matter, however, as the actual arithmetic is defined in such a way that the format doesn't matter. Even hypothetical ternary systems have to behave as if they were binary. Shifts in particular are defined as being equivilent to multiplying/dividing by 2n [unless you are right shifting a negative value, in which case the results are implementation defined].

CM

Share this post


Link to post
Share on other sites
Quote:
Original post by Conner McCloud
However, it does specify that memory model is supposed to be consistant with the C99 standard. Further, it specifies that the contents of <climits> be the same as the contents of "limits.h". Referencing the C standard is a pretty solid move in many cases. I'm neither agreeing or disagreeing with the statement that CHAR_BITS has to be at least 8, however...I don't have access to the C standard, and I don't see anything in the C++ standard that explicitly states a minimum value. For what its worth, I have heard that numerous times, however, by people that generally know what they're talking about.


The original C++ standard is dated in 1998 and it clearly states that its normative reference is to the C89 standard. Sure enough, you are correct that the 2003 edition uses C99 as its standard reference. However, 17.4.1.2p4 states that the cname headers correspond to the similarly named name.h C header file as specified in the original standard. I don't have the 9899:1990 standard, it's surprisingly difficult to find such reference on the web, so I can't answer whether it makes the requirement that CHAR_BITS be at least 8.

I'm just trying to say that referring to the C99 standard does not give a definitive answer as to what the C89 standard contains. It might also contain that same minimum. I guess in the end, I shouldn't be wasting my time since whether or not its true is basically irrelevant in the modern world. I doubt there will be a viable game platform in the near future that is going to be 7-bit.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
C89 final draft
2.2.4.2 Numerical limits

A conforming implementation shall document all the limits specified
in this section, which shall be specified in the headers
and .

"Sizes of integral types "

The values given below shall be replaced by constant expressions
suitable for use in #if preprocessing directives. Their
implementation-defined values shall be equal or greater in magnitude
(absolute value) to those shown, with the same sign.

* maximum number of bits for smallest object that is not a bit-field (byte)
CHAR_BIT 8

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
No HTML for AP, so link didn't work. Here's the link I tried to make:
http://danpop.home.cern.ch/danpop/ansi.c

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this