Jump to content
  • Advertisement
Sign in to follow this  
3DModelerMan

Endian independent binary files?

This topic is 2056 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm trying to write binary data on Android and read it from Windows (and also the other way around). I tried just using the std::fstream in binary mode alone. The thing is, when I read the data from Windows that I wrote in Android I get lots of corrupted values. I read that ARM is a bi-endian processor so it could be little or big endian depending on the device? How can binary file formats be written to be endian independent?

Share this post


Link to post
Share on other sites
Advertisement
Endianness is dictated by the OS, so even if the processor supports it, it doesn't matter.

You are probably looking for ::htons and their ilk http://msdn.microsoft.com/en-gb/library/windows/desktop/ms738557(v=vs.85).aspx

Share this post


Link to post
Share on other sites

I'm trying to write binary data on Android and read it from Windows (and also the other way around). I tried just using the std::fstream in binary mode alone. The thing is, when I read the data from Windows that I wrote in Android I get lots of corrupted values. I read that ARM is a bi-endian processor so it could be little or big endian depending on the device? How can binary file formats be written to be endian independent?


Just use big-endian and write/read data on Android using the DataOutputStream and DataInputStream classes (They read/write data as big endian regardless of platform)

Share this post


Link to post
Share on other sites
I'm writing using the NDK. Is there any way to tell if a variable is big endian already? ntohl and htonl just flip the byte order and don't check anything right?

Share this post


Link to post
Share on other sites
No, ntohl doesn't do anything if the platform is already the correct endianness (but there is no check). You need to recompile if you ship binaries on different endian platforms though.

Share this post


Link to post
Share on other sites
A given value doesn't have any implicit endianness. You need to know the endianness of whatever wrote the data. The most robust approach for then reading the data would be to store the endianness as header information in the file, so regardless of what platform reads the file, it knows how to handle the data.

Share this post


Link to post
Share on other sites
You cannot tell from looking at a variable whether it's little endian or big endian, except if you know the contents of the variable already (this is the base of many runtime endianness detection tricks).

One solution is to always enforce one particular endianness. Using ntohl as suggested above is a portable and relatively easy way of doing that for single integers.

Another approach is knowing the endianness of a file before reading it (and writing whatever is native). To know which endianness was used writing data to a file, you need to start your file with a well-known magic value. Since you know that value, you know what it must look like, and what it looks like when endianness is incorrect.
If you read the first word and it comes out correctly, just read the rest of your file. If it comes out with wrong byte order, you need to flip all values. If it comes out as something else, it's not a file of the type you expected. Microsoft's infamous byte order mark is nothing else but exactly that.

Share this post


Link to post
Share on other sites

If you read the first word and it comes out correctly, just read the rest of your file. If it comes out with wrong byte order, you need to flip all values. If it comes out as something else, it's not a file of the type you expected. Microsoft's infamous byte order mark is nothing else but exactly that.

Are you talking about the Unicode BOM? I don't think that has anything to do with Microsoft (other than indirectly via their involvement in the working group). Though if you know better, I'd be interested to hear more about it.

Share this post


Link to post
Share on other sites
One concrete of a file format that does something like this would be TIF:
Every TIFF begins with a 2-byte indicator of byte order: "II" for little-endian (aka "intel byte ordering", circa 1980) and "MM" for big-endian (aka "motorola byte ordering", circa 1980) byte ordering. The third byte represents the number 42 which happens to be the ASCII character "*", also represented by hexidecimal 2A, selected because this is the binary pattern 101010 and "for its deep philosophical significance". The 4th byte is represented by a 0, an ASCII "NULL". All words, double words, etc., in the TIFF file are assumed to be in the indicated byte order. The TIFF 6.0 specification says that compliant TIFF readers must support both byte orders (II and MM); writers may use either.[/quote] (source)

Share this post


Link to post
Share on other sites
It is common to see confusion how to handle endianess (though most got it right above). The trick is to care about the defined format of the data, and not using any mechanisms that depend on the current architecture. See the excellent blog The byte order fallacy by Rob Pike.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!