Jump to content

  • Log In with Google      Sign In   
  • Create Account


Endian independent binary files?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
10 replies to this topic

#1 3DModelerMan   Members   -  Reputation: 994

Like
0Likes
Like

Posted 02 December 2012 - 05:26 PM

I'm trying to write binary data on Android and read it from Windows (and also the other way around). I tried just using the std::fstream in binary mode alone. The thing is, when I read the data from Windows that I wrote in Android I get lots of corrupted values. I read that ARM is a bi-endian processor so it could be little or big endian depending on the device? How can binary file formats be written to be endian independent?

Sponsor:

#2 Paradigm Shifter   Crossbones+   -  Reputation: 5252

Like
2Likes
Like

Posted 02 December 2012 - 05:57 PM

Endianness is dictated by the OS, so even if the processor supports it, it doesn't matter.

You are probably looking for ::htons and their ilk http://msdn.microsoft.com/en-gb/library/windows/desktop/ms738557(v=vs.85).aspx
"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

#3 SimonForsman   Crossbones+   -  Reputation: 6036

Like
1Likes
Like

Posted 02 December 2012 - 06:07 PM

I'm trying to write binary data on Android and read it from Windows (and also the other way around). I tried just using the std::fstream in binary mode alone. The thing is, when I read the data from Windows that I wrote in Android I get lots of corrupted values. I read that ARM is a bi-endian processor so it could be little or big endian depending on the device? How can binary file formats be written to be endian independent?


Just use big-endian and write/read data on Android using the DataOutputStream and DataInputStream classes (They read/write data as big endian regardless of platform)
I don't suffer from insanity, I'm enjoying every minute of it.
The voices in my head may not be real, but they have some good ideas!

#4 3DModelerMan   Members   -  Reputation: 994

Like
0Likes
Like

Posted 02 December 2012 - 06:12 PM

I'm writing using the NDK. Is there any way to tell if a variable is big endian already? ntohl and htonl just flip the byte order and don't check anything right?

#5 Paradigm Shifter   Crossbones+   -  Reputation: 5252

Like
3Likes
Like

Posted 02 December 2012 - 06:15 PM

No, ntohl doesn't do anything if the platform is already the correct endianness (but there is no check). You need to recompile if you ship binaries on different endian platforms though.
"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

#6 Zipster   Crossbones+   -  Reputation: 597

Like
2Likes
Like

Posted 02 December 2012 - 10:44 PM

A given value doesn't have any implicit endianness. You need to know the endianness of whatever wrote the data. The most robust approach for then reading the data would be to store the endianness as header information in the file, so regardless of what platform reads the file, it knows how to handle the data.

#7 samoth   Crossbones+   -  Reputation: 4683

Like
1Likes
Like

Posted 03 December 2012 - 04:48 AM

You cannot tell from looking at a variable whether it's little endian or big endian, except if you know the contents of the variable already (this is the base of many runtime endianness detection tricks).

One solution is to always enforce one particular endianness. Using ntohl as suggested above is a portable and relatively easy way of doing that for single integers.

Another approach is knowing the endianness of a file before reading it (and writing whatever is native). To know which endianness was used writing data to a file, you need to start your file with a well-known magic value. Since you know that value, you know what it must look like, and what it looks like when endianness is incorrect.
If you read the first word and it comes out correctly, just read the rest of your file. If it comes out with wrong byte order, you need to flip all values. If it comes out as something else, it's not a file of the type you expected. Microsoft's infamous byte order mark is nothing else but exactly that.

#8 e‍dd   Members   -  Reputation: 2105

Like
1Likes
Like

Posted 03 December 2012 - 06:34 AM

If you read the first word and it comes out correctly, just read the rest of your file. If it comes out with wrong byte order, you need to flip all values. If it comes out as something else, it's not a file of the type you expected. Microsoft's infamous byte order mark is nothing else but exactly that.

Are you talking about the Unicode BOM? I don't think that has anything to do with Microsoft (other than indirectly via their involvement in the working group). Though if you know better, I'd be interested to hear more about it.

#9 BitMaster   Crossbones+   -  Reputation: 3883

Like
1Likes
Like

Posted 03 December 2012 - 09:28 AM

One concrete of a file format that does something like this would be TIF:

Every TIFF begins with a 2-byte indicator of byte order: "II" for little-endian (aka "intel byte ordering", circa 1980) and "MM" for big-endian (aka "motorola byte ordering", circa 1980) byte ordering. The third byte represents the number 42 which happens to be the ASCII character "*", also represented by hexidecimal 2A, selected because this is the binary pattern 101010 and "for its deep philosophical significance". The 4th byte is represented by a 0, an ASCII "NULL". All words, double words, etc., in the TIFF file are assumed to be in the indicated byte order. The TIFF 6.0 specification says that compliant TIFF readers must support both byte orders (II and MM); writers may use either.

(source)

#10 larspensjo   Members   -  Reputation: 1526

Like
0Likes
Like

Posted 04 December 2012 - 06:58 AM

It is common to see confusion how to handle endianess (though most got it right above). The trick is to care about the defined format of the data, and not using any mechanisms that depend on the current architecture. See the excellent blog The byte order fallacy by Rob Pike.
Current project: Ephenation.
Sharing OpenGL experiences: http://ephenationopengl.blogspot.com/

#11 samoth   Crossbones+   -  Reputation: 4683

Like
0Likes
Like

Posted 04 December 2012 - 07:31 AM

Are you talking about [...]?

Yes, the Unicode BOM. I've named Microsoft both because of their involvement in the working group and because most (all?) of their products are well-known to use BOMs even when not appropriate such as in UTF-8 encoded files (though, it's not strictly forbidden, but it is nonsense and causes trouble with some software).
Also, it is noteworthy that Windows is "The One Big Architecture" on the planet which is little-endian, and the single biggest platform that consistenly uses BOMs. For the most part, this is a good thing, but in return, it means that virtually all existing documents in the world are invalid by design, if you are pedantic (which is a quite funny thing!).
To explain: The byte order mark is U+FEFF (zero witdth no-break space), which is written as U+FFEF in little endian. U+FFEF is not a legitimate character, so if one wants to look at it that way, every little-endian document having a BOM is actually automatically invalid (to amend this, it was later agreed that a BOM, necessary or not, illegal or not, is to be ignored).

Either way, it's a good example of "something that works" for the endianness problem. Whether one likes it or not, it works, and it works well.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS