\$30

### Image of the Day Submit

IOTD | Top Screenshots

## Serializing floats to bytes, and byte ordering issues

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

3 replies to this topic

### #1Servant of the Lord  Members

Posted 22 July 2013 - 05:14 PM

Is this code, barring byte order issues, sufficient to write floats in a cross-platform and cross-hardware way?

void ConvertFloatToBytes(float mcFloaty, char *buffer)
{
//Asserts that they are in IEC 559 (aka IEEE 754) format, which is the most common format.
static_assert(std::numeric_limits<float>::is_iec559(), "The code requires we use the IEEE 754 floating point format for binary serialization of floats and doubles.");

//Convert to bytes.
unsigned char *bytes = reinterpret_cast<unsigned char*>(&type);

//We're assuming 'buffer' has enough space.
std::memcpy(buffer, bytes, sizeof(float));
}


Also, when and how do I need to handle byte order?
If I have an array of bytes, do I need to swap every four bytes in that array, regardless of what I put in the array, and regardless of whether the code is running on 64 bit or 32 bit hardware?

So, if I write the string, "0123456789", and I'm on a LittleEndian machine, and I want it converted to Network Byte Order (which I want to use for game file formats, for cross-platform use), the resulting order should be: "3210765498"?

Or do I just need to worry about integers and floats that are larger than one byte?

So using my byte-ordering code:

//Swaps between Big and Little endian types. Returns the result.
#define ChangeEndian16(value)    (((value & 0xFF00) >> 8) | \
((value & 0x00FF) << 8))

#define ChangeEndian32(value)    (((value & 0xFF000000ul) >> 24) | \
((value & 0x00FF0000ul) >>  8) | \
((value & 0x0000FF00ul) <<  8) | \
((value & 0x000000FFul) << 24))

#define ChangeEndian64(value)    (((value & 0xFF00000000000000ull) >> 56) | \
((value & 0x00FF000000000000ull) >> 40) | \
((value & 0x0000FF0000000000ull) >> 24) | \
((value & 0x000000FF00000000ull) >>  8) | \
((value & 0x00000000FF000000ull) <<  8) | \
((value & 0x0000000000FF0000ull) << 24) | \
((value & 0x000000000000FF00ull) << 40) | \
((value & 0x00000000000000FFull) << 56))

inline int16_t LocalToBigEndian(int16_t value)   {   return (BigEndianOrder? value:ChangeEndian16(value));   }
inline uint16_t LocalToBigEndian(uint16_t value) {   return (BigEndianOrder? value:ChangeEndian16(value));   }

inline int32_t LocalToBigEndian(int32_t value)   {   return (BigEndianOrder? value:ChangeEndian32(value));   }
inline uint32_t LocalToBigEndian(uint32_t value) {   return (BigEndianOrder? value:ChangeEndian32(value));   }

inline int64_t LocalToBigEndian(int64_t value)   {   return (BigEndianOrder? value:ChangeEndian64(value));   }
inline uint64_t LocalToBigEndian(uint64_t value) {   return (BigEndianOrder? value:ChangeEndian64(value));   }

#define LocalToNetworkOrder(value)    LocalToBigEndian(value)
#define NetworkOrderToLocal(value)    BigEndianToLocal(value)


.

Am I guaranteed this code will work on all Little and Big Endian architectures where IEEE 754 is used?

void ConvertFloatToBytes(float myFloat, char *buffer)
{
//Asserts that they are in IEC 559 (aka IEEE 754) format, which is the most common format.
static_assert(std::numeric_limits<float>::is_iec559(), "The code requires we use the IEEE 754 floating point format for binary serialization of floats and doubles.");

//Convert to network byte order.
uint32_t networkOrdered = LocalToNetworkOrder(reinterpret_cast<uint32_t>(myFloat));

//Convert to bytes.
unsigned char *bytes = reinterpret_cast<unsigned char*>(&networkOrdered);

//We're assuming 'buffer' has enough space.
std::memcpy(buffer, bytes, sizeof(float));
}


Edited by Servant of the Lord, 22 July 2013 - 05:15 PM.

It's perfectly fine to abbreviate my username to 'Servant' or 'SotL' rather than copy+pasting it all the time.
All glory be to the Man at the right hand... On David's throne the King will reign, and the Government will rest upon His shoulders. All the earth will see the salvation of God.
Of Stranger Flames -

### #2SiCrane  Moderators

Posted 22 July 2013 - 06:06 PM

Sadly, while IEEE 754 specifies bit order it doesn't specify anything about byte order. It's possible to find hardware that is otherwise little endian where floats appear like you would expect on a big endian machine and vice versa. If you want to support multiple hardware architectures you're going to have to be prepared to special case your floating point conversions.

For strings, you generally don't re-arrange byte orders for pretty much the same reason you don't do any conversion on text files between machines.

Also reinterpret_cast<uint32_t>(myFloat) probably won't do what you expect. You probably wanted to reinterpret the address rather than the value.

### #3Servant of the Lord  Members

Posted 22 July 2013 - 06:32 PM

Sadly, while IEEE 754 specifies bit order it doesn't specify anything about byte order. It's possible to find hardware that is otherwise little endian where floats appear like you would expect on a big endian machine and vice versa. If you want to support multiple hardware architectures you're going to have to be prepared to special case your floating point conversions.

All I care about supporting is iOS, Android, Mac, Linux, and Windows. I know some of these have different byte orders.
So do I do anything different to floats than what I do to integers?

For strings, you generally don't re-arrange byte orders for pretty much the same reason you don't do any conversion on text files between machines.

So:
A) I only need to handle basic types larger than 1 char, such as floats, uint16_t, uint32_t, doubles, etc...?
B) I handle floats and doubles the exact same way I handle uint32_t and uint64_t?
C) I handle uint64_t by entirely mirroring the order of the bytes? So bytes [01234567] becomes [76543210]?
It's perfectly fine to abbreviate my username to 'Servant' or 'SotL' rather than copy+pasting it all the time.
All glory be to the Man at the right hand... On David's throne the King will reign, and the Government will rest upon His shoulders. All the earth will see the salvation of God.
Of Stranger Flames -

### #4BGB  Members

Posted 23 July 2013 - 12:42 AM

Sadly, while IEEE 754 specifies bit order it doesn't specify anything about byte order. It's possible to find hardware that is otherwise little endian where floats appear like you would expect on a big endian machine and vice versa. If you want to support multiple hardware architectures you're going to have to be prepared to special case your floating point conversions.

All I care about supporting is iOS, Android, Mac, Linux, and Windows. I know some of these have different byte orders.
So do I do anything different to floats than what I do to integers?

errm.

actually, it has more to do with the hardware and CPU architecture than with the OS.

on x86 and x86-64 targets (PCs, laptops, etc...), they use little endian pretty much exclusively (regardless of Windows vs Linux vs ...).

iOS and Android generally run on ARM targets, where ARM also defaults to little-endian.
(could require further verification though, so can't say conclusively that they are LE...).

OTOH: other architectures, such as PowerPC, tend to default to big-endian (IOW: XBox360, PS3, Wii).

I generally prefer though to write endianness independent code over the use of explicit conditional swapping, where basically endianness independent code is code written in such a way that the bytes will be read/written in the intended order regardless of the CPU's native endianness.

in some cases, I have used typedefs to represent endianness specific values, typically represented as a struct:
typedef struct { byte v[4]; } u32le_t; //unsigned int 32-bit little-endian
typedef struct { byte v[8]; } s64be_t; //unsigned int 64-bit big-endian
typedef struct { byte v[8]; } f64le_t; //little-endian double
...

typically, these are handled with some amount of wrapper logic, and being structs more or less prevents accidental mixups (they also help avoid target-specific padding and access-alignment issues).

some target-specific "optimizations" may also be used (say, on x86, directly getting/setting the values for little-endian values rather than messing around with bytes and shifts).

note that these types are generally more used for storage, and not for working with data values (values are typically converted to/from their native forms).

while it is true that not all hardware has floating-point and integer types have the same endianness, relatively few architectures like this are still in current use AFAIK.

one option FWIW, is to basically detect various targets and when possible use a fast direct-conversion path, with a fallback case resorting to the use of arithmetic to perform the conversions (where the arithmetic strategy will still work regardless of the actual internal representation).

note that, in general though, endianness is handled explicitly per-value or per-type, rather than by some sort of generalized byte-swapping.

for many of my custom file-formats, I actually prefer the use of variable-width integer and floating-point values (typically because they are on-average more compact, with each number effectively encoding its own length).

typically a floating-point value will be encoded as a pair of signed variable length integers (this also works well for things like encoding floating-point numbers and vectors into an entropy-coded bitstream, typically this is base,exponent where value=base*2.0^exp, with base=0 as a special case for encoding 0/Inf/NaN/etc...).

but, this is its own topic (there are many options and tradeoffs for variable-length numbers, and even more so when entropy-coding is thrown into the mix...).

otherwise, when designing formats, I tend to prefer little-endian, but will use big-endian if it is already in use in the context (such as when designing extension features for an existing file-format).

common reasons to prefer little-endian are mostly that this is what the most common CPU architectures at this point tend to use.

common reasons to prefer big-endian is that it is the established "network" byte order.

For strings, you generally don't re-arrange byte orders for pretty much the same reason you don't do any conversion on text files between machines.

So:
A) I only need to handle basic types larger than 1 char, such as floats, uint16_t, uint32_t, doubles, etc...?
B) I handle floats and doubles the exact same way I handle uint32_t and uint64_t?
C) I handle uint64_t by entirely mirroring the order of the bytes? So bytes [01234567] becomes [76543210]?

I think more because ASCII text is generally byte-order agnostic by its basic nature.

if we see something like:
"foo: value=1234, ..."
it is pretty well settled how the digits are organized (otherwise people are likely to start rage-facing).

similarly, it would just be weird if one machine would print its digits in one order, but another machine uses another.

generally, for binary file-formats, it is preferable if they "choose one". most file-formats do so, stating their endianness explicitly as part of the format spec.

some file-formats leave this issue up in the air though (leaving the endianess as a per-file, or worse, per-structure-type, matter...). similarly annoying is formats which use file-specific field sizes (so, it is necessary, say, to determine if the file is using 16 or 32 bits for its 'int' or 'word' type, ...). luckily, these sorts of things are relatively uncommon.

it is worth noting that there is also a fair bit of a "grey area", namely binary formats which are stream-based and are byte-order agnostic, for similar reasons to ASCII text.

this is sort of also true of bitstreams, despite them introducing a new notion:
the relevance of how bits are packed into bytes.

interestingly, word endianness naturally arises as a result of this packing (start packing from the LSB using the low-order bit, and you get little-endian, or from the MSB using the high-bit, and you get big-endian). granted, it is technically possible to mix these, effectively getting bit-transposed formats, but these cases tend to be harder to encode/decode efficiently (it tends to involve either reading/writing a bit at a time, or using a transpose-table, *1).

*1: Deflate is partly an example of this: it uses little-little packing for the most part, but Huffman codes are packed starting at the high-bit, resulting in the use of a transpose table when setting up the Huffman tables (but not during the actual main encoding/decoding process).

granted, in bitstream formats, it isn't really uncommon to find a wide range of various forms of funkiness.

Edited by cr88192, 23 July 2013 - 01:19 AM.

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.