Jump to content

  • Log In with Google      Sign In   
  • Create Account

FREE SOFTWARE GIVEAWAY

We have 4 x Pro Licences (valued at $59 each) for 2d modular animation software Spriter to give away in this Thursday's GDNet Direct email newsletter.


Read more in this forum topic or make sure you're signed up (from the right-hand sidebar on the homepage) and read Thursday's newsletter to get in the running!


#Actualcr88192

Posted 23 July 2013 - 01:19 AM

Sadly, while IEEE 754 specifies bit order it doesn't specify anything about byte order. It's possible to find hardware that is otherwise little endian where floats appear like you would expect on a big endian machine and vice versa. If you want to support multiple hardware architectures you're going to have to be prepared to special case your floating point conversions.


All I care about supporting is iOS, Android, Mac, Linux, and Windows. I know some of these have different byte orders.
So do I do anything different to floats than what I do to integers?

 
errm.
 
actually, it has more to do with the hardware and CPU architecture than with the OS.
 
on x86 and x86-64 targets (PCs, laptops, etc...), they use little endian pretty much exclusively (regardless of Windows vs Linux vs ...).
 
iOS and Android generally run on ARM targets, where ARM also defaults to little-endian.
(could require further verification though, so can't say conclusively that they are LE...).
 
OTOH: other architectures, such as PowerPC, tend to default to big-endian (IOW: XBox360, PS3, Wii).
 
 
I generally prefer though to write endianness independent code over the use of explicit conditional swapping, where basically endianness independent code is code written in such a way that the bytes will be read/written in the intended order regardless of the CPU's native endianness.
 
in some cases, I have used typedefs to represent endianness specific values, typically represented as a struct:
typedef struct { byte v[4]; } u32le_t; //unsigned int 32-bit little-endian
typedef struct { byte v[8]; } s64be_t; //unsigned int 64-bit big-endian
typedef struct { byte v[8]; } f64le_t; //little-endian double
...

typically, these are handled with some amount of wrapper logic, and being structs more or less prevents accidental mixups (they also help avoid target-specific padding and access-alignment issues).

some target-specific "optimizations" may also be used (say, on x86, directly getting/setting the values for little-endian values rather than messing around with bytes and shifts).

note that these types are generally more used for storage, and not for working with data values (values are typically converted to/from their native forms).
 
 
while it is true that not all hardware has floating-point and integer types have the same endianness, relatively few architectures like this are still in current use AFAIK.
 
one option FWIW, is to basically detect various targets and when possible use a fast direct-conversion path, with a fallback case resorting to the use of arithmetic to perform the conversions (where the arithmetic strategy will still work regardless of the actual internal representation).


note that, in general though, endianness is handled explicitly per-value or per-type, rather than by some sort of generalized byte-swapping.
 

for many of my custom file-formats, I actually prefer the use of variable-width integer and floating-point values (typically because they are on-average more compact, with each number effectively encoding its own length).

typically a floating-point value will be encoded as a pair of signed variable length integers (this also works well for things like encoding floating-point numbers and vectors into an entropy-coded bitstream, typically this is base,exponent where value=base*2.0^exp, with base=0 as a special case for encoding 0/Inf/NaN/etc...).
 
but, this is its own topic (there are many options and tradeoffs for variable-length numbers, and even more so when entropy-coding is thrown into the mix...).


otherwise, when designing formats, I tend to prefer little-endian, but will use big-endian if it is already in use in the context (such as when designing extension features for an existing file-format).

common reasons to prefer little-endian are mostly that this is what the most common CPU architectures at this point tend to use.

common reasons to prefer big-endian is that it is the established "network" byte order.
 
 

For strings, you generally don't re-arrange byte orders for pretty much the same reason you don't do any conversion on text files between machines.


So:
A) I only need to handle basic types larger than 1 char, such as floats, uint16_t, uint32_t, doubles, etc...?
B) I handle floats and doubles the exact same way I handle uint32_t and uint64_t?
C) I handle uint64_t by entirely mirroring the order of the bytes? So bytes [01234567] becomes [76543210]?


I think more because ASCII text is generally byte-order agnostic by its basic nature.

if we see something like:
"foo: value=1234, ..."
it is pretty well settled how the digits are organized (otherwise people are likely to start rage-facing).

similarly, it would just be weird if one machine would print its digits in one order, but another machine uses another.


generally, for binary file-formats, it is preferable if they "choose one". most file-formats do so, stating their endianness explicitly as part of the format spec.

some file-formats leave this issue up in the air though (leaving the endianess as a per-file, or worse, per-structure-type, matter...). similarly annoying is formats which use file-specific field sizes (so, it is necessary, say, to determine if the file is using 16 or 32 bits for its 'int' or 'word' type, ...). luckily, these sorts of things are relatively uncommon.


it is worth noting that there is also a fair bit of a "grey area", namely binary formats which are stream-based and are byte-order agnostic, for similar reasons to ASCII text.


this is sort of also true of bitstreams, despite them introducing a new notion:
the relevance of how bits are packed into bytes.

interestingly, word endianness naturally arises as a result of this packing (start packing from the LSB using the low-order bit, and you get little-endian, or from the MSB using the high-bit, and you get big-endian). granted, it is technically possible to mix these, effectively getting bit-transposed formats, but these cases tend to be harder to encode/decode efficiently (it tends to involve either reading/writing a bit at a time, or using a transpose-table, *1).

*1: Deflate is partly an example of this: it uses little-little packing for the most part, but Huffman codes are packed starting at the high-bit, resulting in the use of a transpose table when setting up the Huffman tables (but not during the actual main encoding/decoding process).

granted, in bitstream formats, it isn't really uncommon to find a wide range of various forms of funkiness.

#2cr88192

Posted 23 July 2013 - 01:08 AM

Sadly, while IEEE 754 specifies bit order it doesn't specify anything about byte order. It's possible to find hardware that is otherwise little endian where floats appear like you would expect on a big endian machine and vice versa. If you want to support multiple hardware architectures you're going to have to be prepared to special case your floating point conversions.


All I care about supporting is iOS, Android, Mac, Linux, and Windows. I know some of these have different byte orders.
So do I do anything different to floats than what I do to integers?

 
errm.
 
actually, it has more to do with the hardware and CPU architecture than with the OS.
 
on x86 and x86-64 targets (PCs, laptops, etc...), they use little endian pretty much exclusively (regardless of Windows vs Linux vs ...).
 
iOS and Android generally run on ARM targets, where ARM also defaults to little-endian.
(could require further verification though, so can't say conclusively that they are LE...).
 
OTOH: other architectures, such as PowerPC, tend to default to big-endian.
 
 
I generally prefer though to write endianness independent code over the use of explicit conditional swapping, where basically endianness independent code is code written in such a way that the bytes will be read/written in the intended order regardless of the CPU's native endianness.
 
in some cases, I have used typedefs to represent endianness specific values, typically represented as a struct:
typedef struct { byte v[4]; } u32le_t; //unsigned int 32-bit little-endian
typedef struct { byte v[8]; } s64be_t; //unsigned int 64-bit big-endian
typedef struct { byte v[8]; } f64le_t; //little-endian double
...

typically, these are handled with some amount of wrapper logic, and being structs more or less prevents accidental mixups (they also help avoid target-specific padding and access-alignment issues).

some target-specific "optimizations" may also be used (say, on x86, directly getting/setting the values for little-endian values rather than messing around with bytes and shifts).

note that these types are generally more used for storage, and not for working with data values (values are typically converted to/from their native forms).
 
 
while it is true that not all hardware has floating-point and integer types have the same endianness, relatively few architectures like this are still in current use AFAIK.
 
one option FWIW, is to basically detect various targets and when possible use a fast direct-conversion path, with a fallback case resorting to the use of arithmetic to perform the conversions (where the arithmetic strategy will still work regardless of the actual internal representation).


note that, in general though, endianness is handled explicitly per-value or per-type, rather than by some sort of generalized byte-swapping.
 

for many of my custom file-formats, I actually prefer the use of variable-width integer and floating-point values (typically because they are on-average more compact, with each number effectively encoding its own length).

typically a floating-point value will be encoded as a pair of signed variable length integers (this also works well for things like encoding floating-point numbers and vectors into an entropy-coded bitstream, typically this is base,exponent where value=base*2.0^exp, with base=0 as a special case for encoding 0/Inf/NaN/etc...).
 
but, this is its own topic (there are many options and tradeoffs for variable-length numbers, and even more so when entropy-coding is thrown into the mix...).


otherwise, when designing formats, I tend to prefer little-endian, but will use big-endian if it is already in use in the context (such as when designing extension features for an existing file-format).

common reasons to prefer little-endian are mostly that this is what the most common CPU architectures at this point tend to use.

common reasons to prefer big-endian is that it is the established "network" byte order.
 

For strings, you generally don't re-arrange byte orders for pretty much the same reason you don't do any conversion on text files between machines.


So:
A) I only need to handle basic types larger than 1 char, such as floats, uint16_t, uint32_t, doubles, etc...?
B) I handle floats and doubles the exact same way I handle uint32_t and uint64_t?
C) I handle uint64_t by entirely mirroring the order of the bytes? So bytes [01234567] becomes [76543210]?


I think more because ASCII text is generally byte-order agnostic by its basic nature.

if we see something like:
"foo: value=1234, ..."
it is pretty well settled how the digits are organized (otherwise people are likely to start rage-facing).

similarly, it would just be weird if one machine would print its digits in one order, but another machine uses another.


generally, for binary file-formats, it is preferable if they "choose one". most file-formats do so, stating their endianness explicitly as part of the format spec.

some file-formats leave this issue up in the air though (leaving the endianess as a per-file, or worse, per-structure-type, matter...). similarly annoying is formats which use file-specific field sizes (so, it is necessary, say, to determine if the file is using 16 or 32 bits for its 'int' or 'word' type, ...). luckily, these sorts of things are relatively uncommon.


it is worth noting that there is also a fair bit of a "grey area", namely binary formats which are stream-based and are byte-order agnostic, for similar reasons to ASCII text.


this is sort of also true of bitstreams, despite them introducing a new notion:
the relevance of how bits are packed into bytes.

interestingly, word endianness naturally arises as a result of this packing (start packing from the LSB using the low-order bit, and you get little-endian, or from the MSB using the high-bit, and you get big-endian). granted, it is technically possible to mix these, effectively getting bit-transposed formats, but these cases tend to be harder to encode/decode efficiently (it tends to involve either reading/writing a bit at a time, or using a transpose-table, *1).

*1: Deflate is partly an example of this: it uses little-little packing for the most part, but Huffman codes are packed starting at the high-bit, resulting in the use of a transpose table when setting up the Huffman tables (but not during the actual main encoding/decoding process).

granted, in bitstream formats, it isn't really uncommon to find a wide range of various forms of funkiness.

#1cr88192

Posted 23 July 2013 - 12:42 AM

Sadly, while IEEE 754 specifies bit order it doesn't specify anything about byte order. It's possible to find hardware that is otherwise little endian where floats appear like you would expect on a big endian machine and vice versa. If you want to support multiple hardware architectures you're going to have to be prepared to special case your floating point conversions.


All I care about supporting is iOS, Android, Mac, Linux, and Windows. I know some of these have different byte orders.
So do I do anything different to floats than what I do to integers?

 
errm.
 
actually, it has more to do with the hardware and CPU architecture than with the OS.
 
on x86 and x86-64 targets (PCs, laptops, etc...), they use little endian pretty much exclusively.
 
iOS and Android generally run on ARM targets, where ARM also defaults to little-endian.
(could require further verification though, so can't say conclusively that they are LE...).
 
OTOH: other architectures, such as PowerPC, tend to default to big-endian.
 
 
I generally prefer though to write endianness independent code over the use of explicit conditional swapping, where basically endianess independent code is code written in such a way that the bytes will be read/written in the intended order regardless of the CPU's native endianess.
 
in some cases, I have used typedefs to represent endianness-specific values, typically represented as a struct:
typedef struct { byte v[4]; } u32le_t; //unsigned int 32-bit little-endian
typedef struct { byte v[8]; } s64be_t; //unsigned int 64-bit big-endian
typedef struct { byte v[8]; } f64le_t; //little-endian double
 
typically, these are handled with some amount of wrapper logic, and being structs more or less prevents accidental mixups (they also help avoid target-specific padding and access-alignment issues).
 
some target-specific "optimizations" may also be used.
 
 
while it is true that not all hardware has floating-point and integer types have the same endianness, relatively few architectures like this are still in current use AFAIK.
 
one option FWIW, is to basically detect various targets and when possible use a fast direct-conversion path, with a fallback case resorting to the use of arithmetic to perform the conversions (where the arithmetic strategy will still work regardless of the actual internal representation).
 

 
for many of my custom file-formats, I actually prefer the use of variable-width integer and floating-point values (typically because they are on-average more compact, with each number effectively encoding its own length).

typically a floating-point value will be encoded as a pair of signed variable length integers (this also works well for things like encoding floating-point numbers and vectors into an entropy-coded bitstream, typically this is base,exponent where value=base*2.0^exp, with base=0 as a special case for encoding 0/Inf/NaN/etc...).
 
but, this is its own topic (there are many options and tradeoffs for variable-length numbers, and even more so when entropy-coding is thrown into the mix...).


otherwise, when designing formats, I tend to prefer little-endian, but will use big-endian if it is already in use in the context (such as when designing extension features for an existing file-format).

common reasons to prefer little endian are mostly that this is what the most common CPU architectures at this point tend to use.

common reasons to prefer big endian is that it is the established "network" byte order.


For strings, you generally don't re-arrange byte orders for pretty much the same reason you don't do any conversion on text files between machines.


So:
A) I only need to handle basic types larger than 1 char, such as floats, uint16_t, uint32_t, doubles, etc...?
B) I handle floats and doubles the exact same way I handle uint32_t and uint64_t?
C) I handle uint64_t by entirely mirroring the order of the bytes? So bytes [01234567] becomes [76543210]?


I think more because ASCII text is generally byte-order agnostic by its basic nature.

if we see something like:
"foo: value=1234, ..."
it is pretty well settled how the digits are organized (otherwise people are likely to start rage-facing).

similarly, it would just be weird if one machine would print its digits in one order, but another machine uses another.


generally, for binary file-formats, it is preferable if they "choose one". most file-formats do so, stating their endianness as part of the format spec.

some file-formats leave this issue up in the air though (leaving the endianess as a per-file, or worse, per-structure-type, matter...). similarly annoying is formats which use file-specific field sizes (so, it is necessary, say, to determine if the file is using 16 or 32 bits for its 'int' or 'word' type, ...). luckily, these sorts of things are relatively uncommon.


it is worth noting that there is also a fair bit of a "grey area", namely binary formats which are stream-based and are byte-order agnostic, for similar reasons to ASCII text.


this is sort of also true of bitstreams, despite them introducing a new notion:
the relevance of how bits are packed into bytes.

interestingly, word endianness naturally arises as a result of this packing (start packing from the LSB using the low-order bit, and you get little-endian, or from the MSB using the high-bit, and you get big-endian). granted, it is technically possible to mix these, effectively getting bit-transposed formats, but these cases tend to be harder to encode/decode efficiently (it tends to involve either reading/writing a bit at a time, or using a transpose-table, *1).

*1: Deflate is partly an example of this: it uses little-little packing for the most part, but Huffman codes are packed starting at the high-bit, resulting in the use of a transpose table when setting up the Huffman tables (but not during the actual main encoding/decoding process).

granted, in bitstream formats, it isn't really uncommon to find a wide range of various forms of funkiness.

PARTNERS