Endian independent binary files?
#1 Members - Reputation: 537
Posted 02 December 2012 - 05:26 PM
#2 Members - Reputation: 1842
Posted 02 December 2012 - 05:57 PM
You are probably looking for ::htons and their ilk http://msdn.microsoft.com/en-gb/library/windows/desktop/ms738557(v=vs.85).aspx
#3 Members - Reputation: 3714
Posted 02 December 2012 - 06:07 PM
I'm trying to write binary data on Android and read it from Windows (and also the other way around). I tried just using the std::fstream in binary mode alone. The thing is, when I read the data from Windows that I wrote in Android I get lots of corrupted values. I read that ARM is a bi-endian processor so it could be little or big endian depending on the device? How can binary file formats be written to be endian independent?
Just use big-endian and write/read data on Android using the DataOutputStream and DataInputStream classes (They read/write data as big endian regardless of platform)
The voices in my head may not be real, but they have some good ideas!
#5 Members - Reputation: 1842
Posted 02 December 2012 - 06:15 PM
#6 Crossbones+ - Reputation: 399
Posted 02 December 2012 - 10:44 PM
#7 Members - Reputation: 1965
Posted 03 December 2012 - 04:48 AM
One solution is to always enforce one particular endianness. Using ntohl as suggested above is a portable and relatively easy way of doing that for single integers.
Another approach is knowing the endianness of a file before reading it (and writing whatever is native). To know which endianness was used writing data to a file, you need to start your file with a well-known magic value. Since you know that value, you know what it must look like, and what it looks like when endianness is incorrect.
If you read the first word and it comes out correctly, just read the rest of your file. If it comes out with wrong byte order, you need to flip all values. If it comes out as something else, it's not a file of the type you expected. Microsoft's infamous byte order mark is nothing else but exactly that.
#8 Members - Reputation: 2042
Posted 03 December 2012 - 06:34 AM
Are you talking about the Unicode BOM? I don't think that has anything to do with Microsoft (other than indirectly via their involvement in the working group). Though if you know better, I'd be interested to hear more about it.If you read the first word and it comes out correctly, just read the rest of your file. If it comes out with wrong byte order, you need to flip all values. If it comes out as something else, it's not a file of the type you expected. Microsoft's infamous byte order mark is nothing else but exactly that.
#9 Members - Reputation: 1406
Posted 03 December 2012 - 09:28 AM
(source)Every TIFF begins with a 2-byte indicator of byte order: "II" for little-endian (aka "intel byte ordering", circa 1980) and "MM" for big-endian (aka "motorola byte ordering", circa 1980) byte ordering. The third byte represents the number 42 which happens to be the ASCII character "*", also represented by hexidecimal 2A, selected because this is the binary pattern 101010 and "for its deep philosophical significance". The 4th byte is represented by a 0, an ASCII "NULL". All words, double words, etc., in the TIFF file are assumed to be in the indicated byte order. The TIFF 6.0 specification says that compliant TIFF readers must support both byte orders (II and MM); writers may use either.
#10 Members - Reputation: 1408
Posted 04 December 2012 - 06:58 AM
#11 Members - Reputation: 1965
Posted 04 December 2012 - 07:31 AM
Yes, the Unicode BOM. I've named Microsoft both because of their involvement in the working group and because most (all?) of their products are well-known to use BOMs even when not appropriate such as in UTF-8 encoded files (though, it's not strictly forbidden, but it is nonsense and causes trouble with some software).Are you talking about [...]?
Also, it is noteworthy that Windows is "The One Big Architecture" on the planet which is little-endian, and the single biggest platform that consistenly uses BOMs. For the most part, this is a good thing, but in return, it means that virtually all existing documents in the world are invalid by design, if you are pedantic (which is a quite funny thing!).
To explain: The byte order mark is U+FEFF (zero witdth no-break space), which is written as U+FFEF in little endian. U+FFEF is not a legitimate character, so if one wants to look at it that way, every little-endian document having a BOM is actually automatically invalid (to amend this, it was later agreed that a BOM, necessary or not, illegal or not, is to be ignored).
Either way, it's a good example of "something that works" for the endianness problem. Whether one likes it or not, it works, and it works well.






