Byte-order, alignment, LSB, etc.

Started by
10 comments, last by null_pointer 24 years ago
I know different OSes use different memory schemes, store their numbers differently, etc. Are there just two basic ways to store numbers (LSB and MSB), or is it pretty much up to the OS? Could someone please explain how different OSes accomplish this, and how to convert between them (in English, please )? I''m using C++ but C code would be fine, too. Or am I not understanding this topic? All I need to know is how to get information (structs, ints, chars, whatever!) from one OS to another, programmatically. Thanks in advance! - null_pointer Sabre Multimedia
Advertisement
It isn''t really an issue of the OS in what way data is stored in memory, its more an issue of the hardware architecture. The OS uses the way the underlying hardware stores its data.
For example, Intel processors use little endian (LSB first), and Motorola processors use big endian (MSB first). For an overview of both, and discussion of (dis)advantages, see http://www.whatis.com/bigendia.htm.

To read in data that''s stored in big endian, you can use the following code (for a 32-bit number, using a buffer of bytes):

int num = buf[3] + (buf[2] << 8) + (buf[1] << 16) + (buf[0] << 24)

Erik
mov eax, buf
bswap eax

voila!
foofightr: I guess that''s a 16-bit number? My asm is a bit rusty (actually, very rusty!).

Everyone: Thanks! So, from logic, it makes sense that there would be only two ways to store it in memory? Okay, it was easier than I thought

Another question in the topic was byte alignment, or how data is stored on disk. Is this a question about the file system, or the OS, or the hardware? I''m guessing the file system, and file systems are pretty much tied to OSes, so, how would I convert an int (or a short or anything else) stored on disk to an int stored on disk in another OS? I''m using the standard C library stuff (fopen()), so should I just "flip" the number around in memory after I read it? Are there any problems with how data is aligned? How might I get around those problems?


- null_pointer
Sabre Multimedia
Alignment is an issue of both hardware, OS and compilers, I guess. Some hardware architectures perform better if data is aligned at a certain boundary. The compiler has to keep this in mind when generating code. The OS has to keep it in mind when loading executable images from disk.

If you want to load ints (for example) from file from different OSses, you have to agree on a format to use in the file, and then convert on each platform that is different.

Data alignment shouldn''t be an issue in files, as you want to keep things as compact as possible. If you read in the different parts of a struct (for example) separately, the compiler makes sure that they are aligned properly (at compile time).

Example:
struct {
int a;
int b;
} s;
fread(&s.a, sizeof(int), 1, f);
fread(&s.b, sizeof(int), 1, f);

You may have to convert a and b from little endian to big endian (or v.v.), depending on the platform your app runs. Usually this is determined at compile time (with various #defines).

Erik
Usually data alignment/padding is a big issue when interfacing with a file. Its certainly best to keep the structures compacted, and easiest to just do a binary read/write of the whole structure from/to disk. The issue is that you must make sure the structure in memory matches the one on disk, which is usually done by putting a #pragma around the structure definition. This tells the compiler not to pad out the structure, which would misalign the members otherwise.

#pragma pack(1) // no padding
struct tag1
{
char a;
int b;
short c;
};
#pragma pack() // back to normal

Now you can just do something like

struct tag1 var;
fread(&var, sizeof(var), 1, fp);

and read directly into the structure. For any member variables bigger than 1 byte you still need to worry about the endian-ness, but the file interfacing is trivial.

Rock
Thanks, but I can''t use pragmas as I can''t determine the structs at compile time -- this program is meant to write data from to disk in other file systems in a format that is consistent across platforms. The program will also read data into memory in the correct native format. I can determine what will be written, as in the data types (int, char, etc.), but I now know that they will not be written as structs. So, does this eliminate the padding problem?

Also, this program will transfer data across the networks. Is it better to convert all the data to one format (either big-endian or little-endian) when sending it, or when receiving it? Does it matter performance-wise?



- null_pointer
Sabre Multimedia
hate to correct u null_pointer, but foofighter displayed the correct code for a 32 bit number (eax is a 32 bit register, ax, 16 bit, ah/al 8 bit)...

so long,
Ice.
If you transfer data over the network, and you''re using sockets (most likely), you have to convert to the network format using the host-to-network functions. See the docs on socket programming for info on this.

Erik
OK, thanks! That''s all my questions answered!


- null_pointer
Sabre Multimedia

This topic is closed to new replies.

Advertisement