Endian independent binary files?

Started by
7 comments, last by larspensjo 11 years, 4 months ago
Are you talking about [...]?
Yes, the Unicode BOM. I've named Microsoft both because of their involvement in the working group and because most (all?) of their products are well-known to use BOMs even when not appropriate such as in UTF-8 encoded files (though, it's not strictly forbidden, but it is nonsense and causes trouble with some software).
Also, it is noteworthy that Windows is "The One Big Architecture" on the planet which is little-endian, and the single biggest platform that consistenly uses BOMs. For the most part, this is a good thing, but in return, it means that virtually all existing documents in the world are invalid by design, if you are pedantic (which is a quite funny thing!).
To explain: The byte order mark is U+FEFF (zero witdth no-break space), which is written as U+FFEF in little endian. U+FFEF is not a legitimate character, so if one wants to look at it that way, every little-endian document having a BOM is actually automatically invalid (to amend this, it was later agreed that a BOM, necessary or not, illegal or not, is to be ignored).

Either way, it's a good example of "something that works" for the endianness problem. Whether one likes it or not, it works, and it works well.

This topic is closed to new replies.

Advertisement