• # Writing Efficient Endian-Independent Code in C++

General and Gameplay Programming

Once upon a time, there was an article published on gamedev.net [Roy2013], which described a way (as he says, mostly taken from Quake2 engine) to deal with Little-Endian/Big-Endian issues in games. While this approach is mostly sound ("mostly" because of unaligned-read issues which will be discussed below), it is not the most efficient one. Better (simpler, faster, and more general) approaches do exist, and they will be discussed below.

## What is Endianness

Endianness itself has been described in many different works, including [[Roy2013] and [WikipediaEndianness]. Basically, it is a way that CPU stores multi-byte data in memory; little-endian systems store least significant byte first, and big-endian ones store most-significant byte first. So, if you have  uint16_t x = 1234;  then x will look as {0xD2, 0x04} on a little-endian system, and as {0x04, 0xD2} on a big-endian system. As a result, code such as  send(socket,&x,2);  will send different data over the wire depending on system endianness (little-endian systems will send {0xD2,0x04}, and big-endian ones will send {0x04,0xD2}). It is important to note that endianness effects cannot be observed unless we have some kind of cast between pointers on data of different sizes (in the example above, there was an implicit cast from &x, which is uint16_t*, to void*, which is actually treated as byte pointer by the send() function). In other words, as long as we keep away from casts and stay within arithmetical and bitwise operations without pointer casts, the result is always the same regardless of endianness. 2+3 is always 5, and (((uint16_t)0xAA)[lessthan][lessthan]3)^0xCCCC is always 0xC99C, regardless of the system where our code is running. Let's name such calculations endianness-agnostic.

## Scope

First of all, where do we need to deal with little-endian/big-endian issues? In fact, there are only two scenarios of which I know, where it is important. First one is reading files (which might have been written on a different machine), and another one is network communication. From our perspective, both of these cases are essentially the same: we're transferring data from one machine to another one.

## Serialization/Marshalling

One thing which should be noted for both these data-transfer-between-machines scenarios, is that you should never transfer data as a C structure; instead, you should serialize/marshal it. Putting C structure in a file (which may be read on another machine) or over the network, is a Really Bad Idea for several reasons. First of all, when writing/reading C structure to external storage, you're becoming a hostage of implicit alignment rules of the compiler you're using. In general, when you have a structure such as  struct X { uint8_t a; uint32_t b; };  then sizeof(X) won't be 5 bytes as some might expect; in many cases sizeof(X) will be 8 bytes (1 byte of a, then 3 unused bytes of padding just to make b aligned on a 4-byte boundary, and then 4 bytes of b), but this is not guaranteed at all. To make things worse, the amount of alignment is not specified by standards, so when you're switching from one compiler to another one, it may change (not to mention switching between CPUs); to make things even worse, it can be affected by compiler switches and on a struct-by-struct basis by things such as #pragma pack. If you are using types such as int or long (rather than guaranteed-size-types such as uint8_t and uint32_t), things worsen even further (yes, this is possible) - due to different sizes of these types on different platforms. Oh, and don't forget that variable-length strings and C++ containers are clearly off-limits. There are other (rather minor) reasons for avoiding writing C structures directly: you'll write more data then necessary, the data written will include garbage (which will affect the ability to compress it), and so on. However, the most important issue is (lack of) inter-platform and inter-compiler compatibility mentioned above. These issues are so important, that in the networking world sending C structures over the network is universally considered a Big No-No. So, what you should do when you need to send a C structure over the network (or to save it to the file)? You should serialize it first (in network world term "marshal" is generally preferred, though it is essentially the same thing).

## Implementing Serialization

The idea behind serialization is simple: for the struct X above you write one byte of a, and 4 bytes of b, avoiding alignment issues. In fact, you can go further and use, for example, VLQ [WikipediaVLQ] variable-length encoding, or put null-terminated strings into your serialized data. One way of serializing data (the one I prefer), is to have serialize/deserialize functions such as  void serialize_uint16(DataBlock&, uint16_t);//DataBlock should grow as the data is serialized uint16_t deserialize_uint16(Parser&);//there is constructor Parser(DataBlock&)  When we have these functions implemented, then serializing our struct X will look as  DataBlock data; serialize_uint8(data,x.a); serialize_uint32(data,x.b);  (and deserializing will look similar). So far so good, now let's see how we can implement our serialize_uint16() function. If we will implement it according to [Roy2013], it would look like:  void serialize_uint16(DataBlock& data,uint16_t u16) { void* ptr = data.grow(2);//add 2 bytes to the end of data, and return pointer to these 2 bytes u16 = LittleShort(u16);//calling ShortSwap on big-endian systems, and ShortNoSwap on little-endian systems *(uint16_t*)ptr = u16; //(*) }  This would work fine on x86 and x86-64, but on the other platforms the line marked as (*) may run into problems. The problem is that our ptr might be either even or odd; and if it is odd - some CPUs will refuse to read 2-byte data from it (also they will usually refuse to read 4-byte data unless its address is a multiple of 4, and so on). This never happens on x86/x86-64, but happens on SPARC, and may or may not happen on ARM (unless we specify __packed qualifier for uint16_t*, but it is not universally available).

## Another Popular Alternative

Another popular alternative (thanks to Servant of the Lord for reminding me of it) is based on LITTLE_ENDIAN and BIG_ENDIAN macros. In some sense, it can be seen as the same serialize_uint16() as above, but using different implementation for BigShort() etc.:  //code courtesy of Servant of the Lord #ifdef LITTLE_ENDIAN #define BigShort(x) ShortSwap(x) #define LittleShort(x) (x) //Do nothing, just 'return' the same variable. #define BigLong(x) LongSwap(x) #define LittleLong(x) (x) #define BigFloat(x) FloatSwap(x) #define LittleFloat(x) (x) #elif defined(BIG_ENDIAN) #define BigShort(x) (x) #define LittleShort(x) ShortSwap(x) #define BigLong(x) (x) #define LittleLong(x) LongSwap(x) #define BigFloat(x) FloatSwap(x) #define LittleFloat(x) (x) #else #error No idea about endianness #endif  While it is faster and less bulky than a previous one (see "Performance Analysis" section below), it has the same problem with unaligned read/writes on non-x86 platforms :-(. In other words, for serialization purposes it won't work, for example, on SPARC (and it working for ARM is not guaranteed).

## What Is to be Done?

What is to be done? -- name of the novel by Nikolay Chernyshevsky, 1863 --
The answer is quite simple, actually. Instead of writing two bytes as one chunk, we can always write it byte-by-byte:  void serialize_uint16(DataBlock& data,uint16_t u16) { uint8_t* ptr = data.grow(2); *ptr++ = (uint8_t)u16; *ptr = (uint8_t)(u16 >> 8); }//deserializing is very similar  This technique is well-known (see, for example, [Pike2012], but the idea is known for years before); a Really Good Thing about it is that we don't need to care about endianness at all(!). This stands because the code above doesn't perform any casts, and calculates all the bytes in a completely endianness-agnostic manner (see "What is Endianness" section above); and all writes are exactly 1 byte in size, so there is no chance for endianness to manifest itself. While [Pike2012] argues that all the other marshalling methods represent a fallacy, I'm not so sure about it, and will describe an improvement over byte-by-byte marshalling in a moment.

## Further Optimization

When I really care about performance (which I usually do, as server-side handling of billions of network messages per day is quite a significant load), I often add a special handling for those platforms of the most interest, for example (taken from [NoBugs2015]):  void serialize_uint16(DataBlock& data, uint16_t u16) { //assuming little-endian order on the wire uint8_t* ptr = data.grow(2); #if defined(__i386) || defined(__x86_64__) || defined(_M_IX86) || defined(_M_X64) *(uint16_t*)ptr = u16; // safe and fast as x86/x64 are ok with unaligned writes #else *ptr++ = (uint8_t)u16; *ptr = (uint8_t)(u16 >> 8); #endif }  With this approach, we have the best of both worlds: (a) universal version (the one under #else) which works everywhere, and (b) optimized version for specific platforms which we know for sure will work efficiently there.

## Implementation Notes

One thing to note when implementing marshaling, is that in most cases it is simpler to do it using unsigned integers rather than signed ones; while using signed types isn't formally a bad thing, in practice it tends to cause trouble. Not that it isn't possible to implement marshalling with signed ints - it is just simpler to implement it with unsigned ints, with one less thing to worry about. For a list of troubles which can be caused by using signed stuff for marshalling - see a comment by SICrane below; however, you don't really need to care about them - just use unsigned and you'll be fine :-). Another thing to keep in mind is to use those guaranteed-size types such as uint32_t and uint16_t (rather than int and short). You never know where your code will be compiled in 5 years from now, and just recently I've seen a guy who needed to fix his code because when compiling for AVR8, sizeof(int) is 2 (but sizeof(uint32_t) is always 4, regardless of the platform).

## Summary

Properties of the three ways to handle endianness can be summarized in the following table: [table] [tr] [th][/th] [th]Applicability[/th] [th]Clocks on x86, write uint16_t[/th] [th]Clocks on x86, read uint16_t[/th] [th]Clocks on x86, write uint32_t[/th] [th]Clocks on x86, write uint32_t Clocks on x86, read uint32_t[/th] [/tr] [tr] [td][Roy2013][/td] [td]Only C-structure based[/td] [td]16-21[/td] [td]17-23[/td] [td]16-21[/td] [td]17-23[/td] [/tr] [tr] [td]LITTLE_/BIG_ENDIAN[/td] [td]Only C-structure based[/td] [td]1[/td] [td]2-3[/td] [td]1[/td] [td]2-3[/td] [/tr] [tr] [td]Endianness-agnostic[/td] [td]Both C-structure based and serialization[/td] [td]4[/td] [td]6-8[/td] [td]8[/td] [td]12-16[/td] [/tr] [tr] [td][NoBugs2015][/td] [td]Both C-structure based and serialization[/td] [td]1[/td] [td]2-3[/td] [td]1[/td] [td]2-3[/td] [/tr] [/table] Of course, on non-x86 platforms the picture won't be as good for [Nobugs2015] as it is written, but it will still perform exactly as the endianness-agnostic one, and there will be an option to optimize it for a specific platform (in the manner similar to x86 optimization) if necessary and possible.

## References

[Roy2013] Promit Roy, Writing Endian Independent Code in C++, 2013, http://www.gamedev.net/page/resources/_/technical/general-programming/writing-endian-independent-code-in-c-r3301 [WikipediaEndianness] https://en.wikipedia.org/wiki/Endianness [WikipediaVLQ] https://en.wikipedia.org/wiki/Variable-length_quantity [Pike2012] Rob Pike, The Byte Order Fallacy, 2012, http://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html [NoBugs2015] 'No Bugs' Hare, 64 Network DO's and DON'Ts for Game Engine Developers, Part IIa, Protocols and APIs, 2015 http://ithare.com/64-network-dos-and-donts-for-game-engine-developers-part-iia-protocols-and-apis/

## Article Update Log

Jun 5, 2015: Minor change to explanation "why signed is bad" in Implementation Detail section Jun 4, 2015: Added LITTLE_/BIG_ENDIAN approach to those being analyzed Jun 4, 2015: Added 'Implementation Notes' section Jun 2, 2015: Initial release

Report Article

## User Feedback

You need to be a member in order to leave a review

## Create an account

Register a new account

There are no reviews to display.

• 0
• 25
• 0
• 1
• 0

• 13
• 26
• 10
• 11
• 9
×