C++ | Fixed Byte Size

Started by
43 comments, last by Bacterius 8 years, 7 months ago

Hi,

I am currently working on the "Core" Module of my game engine. It requires accessing an archiv.

My problem currently is the following:



typedef unsigned long DWORD;
typedef unsigned short WORD;

		struct ZIPHeader{
			DWORD 	Signature;
			WORD	Version;
			WORD 	GeneralPurpose;
			WORD	CompressionMethod;
			WORD	LastModTime;
			WORD	LastModDate;
			DWORD	CRC32;
			DWORD	CompSize;
			DWORD	UnCompSize;
			WORD	NameLength;
			WORD	ExtraSize;
		};


How do I ensure, that WORD is always 2 Bytes and DWORD always 4 Bytes across diffrent hosts?

I've been taught that diffrent CPU-architectures define different sizes for the data types.

So, how can I ensure having the right size of the variable?

It's not a shame to make mistakes. As a programmer I am doing mistakes on a daily basis. - JonathanKlein

Advertisement

There's <stdint.h>, which should have defined types (int8_t/int16_t/int32_t, etc) with fixed sizes on all supported platforms.

See https://en.wikibooks.org/wiki/C_Programming/C_Reference/stdint.h#Exact-width_integer_types

To be fair, he asked about C++, not C, so #include <cstdint> to use int32_t etc. See a good C++ reference.

Stephen M. Webb
Professional Free Software Developer

If you have a suitably up-to-date compiler, and feel extra paranoid, you can also use static_assert to verify byte sizes at compile time.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Another thing to consider is how the struct member packing is set up, if you're serializing that struct by memory-copying/fread/etc the whole thing at once.

How do I ensure, that WORD is always 2 Bytes and DWORD always 4 Bytes across diffrent hosts?

I've been taught that diffrent CPU-architectures define different sizes for the data types.

So, how can I ensure having the right size of the variable?

Just to be clear, they're not talking about different cpu architectures like going from a windows machine with an Intel processor to running the same program on an AMD processor. That is talking about going from an x86 family processor (like your PC) over to an ARM family processor like your cell phone. Or potentially they're talking about moving from a 32-bit compiler to a 64-bit compiler. You have to actually change architectures, which means a whole lot of things change.

If you are running the same executable on all the different machines the sizes will be the same.

The other thing to consider is padding inside the structure. Just because a field is 32 bits or 16 bits does not mean they are packed the same. Padding inside structures is an implementation detail in this language, but must be a specific spacing for file formats. They could all be aligned on byte boundaries, or 2-byte boundaries, or 4-byte boundaries, or 16-byte boundaries, or whatever else the compiler feels like doing. There are compiler-specific commands to adjust that.

Endian-ness will be another concern if you are going cross platform. While the values are the same as far as your code is concerned (the value is 12345678) the actual encoding of the byte pattern can be different. On one platform a 16 bit value may be encoded as AB, on another it is BA. For a 32 bit value it may be ABCD or DCBA or on some middle endian hardware potentially BADC.

These concerns don't really apply if you are talking about staying on a single architecture, such as building a program that only runs on 32-bit windows, or only runs on 64-bit windows.

@frob:
I didn't know that C++ compilers do use padding for their structs. My mind was already blown when I encountered this with HLSL.
How can I ensure that it is packed as one unit without padding? I am looking forward to support Linux and Windows.


The problem with "cstdint" is that it only offers datatypes with 8 bytes or more (correct me if this is wrong).

PKWare defined their ZIP-Headers using 4 and 2 Bytes.

I really don't want to use bitsets or "high/low orders".

It's not a shame to make mistakes. As a programmer I am doing mistakes on a daily basis. - JonathanKlein

Just to be clear, they're not talking about different cpu architectures like going from a windows machine with an Intel processor to running the same program on an AMD processor. That is talking about going from an x86 family processor (like your PC) over to an ARM family processor like your cell phone. Or potentially they're talking about moving from a 32-bit compiler to a 64-bit compiler. You have to actually change architectures, which means a whole lot of things change.

The size of these standard types also depend on the platform ABI. For example, the standard C type "long" is only 4 bytes on 64 bit Windows but 8 bytes on most other 64 bit platforms.

I didn't know that C++ compilers do use padding for their structs. My mind was already blown when I encountered this with HLSL.
How can I ensure that it is packed as one unit without padding? I am looking forward to support Linux and Windows.

There are compiler switches and pragmas you can use but then you suddenly have to code to support different compilers and make sure that the size of the struct is the same as the ZIP header if you ever plan on reading and writing the entire struct at once. Reading or writing one field at a time using the integer types from the stdint header is guaranteed to work on platforms where those integer types exist so that's the option I would go for (don't forget to account for the endianness though). If you want maximum portability then you're in for a real headache since some platforms might not even have integer types of the same length as those in the ZIP header. It can be solved by messing around with bit masks but it's not something I would bother with unless I knew I was targeting esoteric platforms.

@frob:
I didn't know that C++ compilers do use padding for their structs. My mind was already blown when I encountered this with HLSL.
How can I ensure that it is packed as one unit without padding? I am looking forward to support Linux and Windows.


The problem with "cstdint" is that it only offers datatypes with 8 bytes or more (correct me if this is wrong).

PKWare defined their ZIP-Headers using 4 and 2 Bytes.

I really don't want to use bitsets or "high/low orders".

The number at the end (uint8_t, int16_t, etc...) is the number of BITS not the number of bytes. So on a x86/x64 machine BYTE = uint8_t = 1 byte, WORD = uint16_t = 2 bytes, DWORD = uint32_t = 4 bytes, QWORD = uint64_t = 8 bytes.

About packing:

The compiler performs packing to speed up data access, or to make all forms of addressing possible (some CPUs cannot pull a 32bit number from an odd address, for example).
So in general, it's better to keep packing than to force the compiler into removing it.

The general strategy to avoid packing is to sort your data members from large to small size.
I normally don't bother about a few bytes, and keep related data next to each other, as it speeds up development. You can always shuffle data fields into a better order at the end of development, after you checked memory usage.


However, if you insist, most compilers have options to influence packing, like http://gcc.gnu.org/onlinedocs/gcc/Structure-Packing-Pragmas.html

This topic is closed to new replies.

Advertisement