• Create Account

## sizeof() not working ?!?

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

20 replies to this topic

Posted 29 January 2013 - 04:37 PM

Hi guys, I am having a hard time understanding something right now:

struct FormatChunk
{
BYTE subchunk1ID[4];
DWORD subchunk1Size;
short audioFormat;
};

DWORD dwSize=sizeof(FormatChunk); // is 12  !!!!
dwSize=sizeof(short); // is 2 like it should be


What the hell ?

### #2ApochPiQ  Moderators

Posted 29 January 2013 - 04:40 PM

POPULAR

You have 4 BYTEs, a DWORD (4 more bytes) and a short (2 bytes). Including structure padding, this gives 12 bytes.
Wielder of the Sacred Wands

### #3SiCrane  Moderators

Posted 29 January 2013 - 04:40 PM

POPULAR

What you're seeing is data alignment at work. Basically your DWORD needs to be allocated on a multiple of four bytes, so the size of the struct needs to be a multiple of four bytes.

Posted 29 January 2013 - 04:42 PM

structs are usually padded to a 4 byte boundary. If you are using MSVC, you can control the packing, check out #pragma pack. If you can afford the memory, go with the usual packing though (it's more efficient to read DWORDs off 4 byte boundaries usually). Packing is important if you are worried about data file size or sending it acros a network, however.
"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

Posted 29 January 2013 - 04:57 PM

Ahh, ok guys, I thought I was going crazy.

I'm going to read up on it.

In general though, if I have to read 10 bytes into that struct I can't rely on sizeof() and should explicitly set it to read 10, is that right ?

Thanks.

### #6L. Spiro  Members

Posted 29 January 2013 - 05:07 PM

POPULAR

If your structure is meant to be used to map in-memory file data you should enforce finer control over the alignment/padding of the structure with #pragma pack where available.

If not available, you should approach the problem in a different way period.

If your byte array was 3 bytes instead of 4 bytes in length, subchunk1Size would still be aligned to 4 bytes, with an extra byte added before it for padding.

Padding does not occur just at the end of structures, but also between members inside the structure, so if you can’t strictly control how it is padded, don’t even try to use that approach.

So no, don’t hard-code 10 anywhere.  You should use sizeof(), and if that is not reliable then don’t use this method at all.

L. Spiro

Edited by L. Spiro, 29 January 2013 - 06:51 PM.

Posted 29 January 2013 - 05:07 PM

Nope, if you save it out use sizeof(FormatChunk) to get how many bytes to write out (and read in). Never manually try and work out the size of a struct to write to disk.
"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

Posted 29 January 2013 - 06:30 PM

sizeof(subchunk1ID) + sizeof(subchunk1Size) +  sizeof(audioFormat) will always be 10 regardless of any structure padding.

Posted 29 January 2013 - 06:33 PM

Yikes, that's a maintenance nightmare if you add a field to the struct or change the order of them.
"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

### #10swiftcoder  Senior Moderators

Posted 29 January 2013 - 06:44 PM

The best option is typically to use #pragma pack (with push/pop). This allows you to eliminate the padding entirely (using a pack of 1), and at that point the representation in memory should be identical to that on disk (barring endian issues, which you are far less likely to run into since Mac's switched to Intel).

However, be aware that there is a performance penalty to read/write unaligned memory - you should only pack structures directly involved in I/O.

Tristam MacDonald - Software Engineer @ Amazon - [swiftcoding] [GitHub]

### #11Servant of the Lord  Members

Posted 29 January 2013 - 07:40 PM

Nope, if you save it out use sizeof(FormatChunk) to get how many bytes to write out (and read in). Never manually try and work out the size of a struct to write to disk.

Though the padding may vary from computer to computer - for example, when sending packets from a game server to the game clients - and you'd read corrupted data.

It'd be better to just stream the data using something that is cross-platform and safe. More work, yes, but not compiler, compiler-version, compiler-settings, operating system, and hardware specific. That's just asking for unexpected and hard-to-track problems to appear when you least expect it.

I write files using a class like this:

//Note: All reading and writing, except in the case of things like strings, are stored
//internally in BigEndian format (for consistent reading and write across platforms).
class BytePacker
{
public:
BytePacker();

//----------------------------------------------------------
//Writing:
//Each additional 'Write' call appends the value to the end of the data being held.
//----------------------------------------------------------
void WriteInt8(int8_t value);
void WriteInt16(int16_t value);
void WriteInt32(int32_t value);
void WriteInt64(int64_t value);

void WriteUint8(uint8_t value);
void WriteUint16(uint16_t value);
void WriteUint32(uint32_t value);
void WriteUint64(uint64_t value);

void WriteBool(bool value);
void WriteFloat(float value);
void WriteDouble(double value);
void WriteString(const std::string &str);

};


Then I do something like this:

//Writes the data into the byte packer.
void Tile::WriteTo(BytePacker &bytePacker)
{
//TileImage:
if(this->tileDisplay)
{
bytePacker.WriteUint32(this->tileDisplay->GetKey());
bytePacker.WriteUint32(this->offset.ToUint32());
bytePacker.WriteUint64(this->EightReservedBytes);

bytePacker.WriteInt8(this->startFrame);

//Store the ImageData of the tile in this Area's ImageDataCache file.
}
else
{
bytePacker.WriteUint32(0);
}

//TileInfo:
this->tileInfo.WriteTo(bytePacker);
}

Some people's serializers even make the "Write" and "Read" code completely identical, and just have a single 'Serialize' function that reads if the serializer is in one mode, and writes if the serializer is in a different mode (here's an example).

It's perfectly fine to abbreviate my username to 'Servant' or 'SotL' rather than copy+pasting it all the time.
All glory be to the Man at the right hand... On David's throne the King will reign, and the Government will rest upon His shoulders. All the earth will see the salvation of God.
Of Stranger Flames -

Posted 29 January 2013 - 07:45 PM

Yeah, if the packing is different that's why you need to use #pragma pack to make it agree cross platform.

And PS3 and XBox 360 need an endian swap as well.

You're still better off preparing a packed struct and endianing it than writing out individual fields (since adding one/moving them around needs a change in the serialise function as well), just endian the packed struct before serialisation. Even better, use metadata to automatically generate the format and the serialise function (sadly we don't do that where I work).
"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

### #13SiCrane  Moderators

Posted 29 January 2013 - 08:09 PM

However, be aware that there is a performance penalty to read/write unaligned memory - you should only pack structures directly involved in I/O.

It goes beyond performance penalty on some platforms. Unaligned memory access can generate a bus error that will crash your program on some processors like some ARM chips you might find in a cell phone.

### #14Servant of the Lord  Members

Posted 29 January 2013 - 08:32 PM

And PS3 and XBox 360 need an endian swap as well.

Handled by the BytePacker class.

You're still better off preparing a packed struct and endianing it than writing out individual fields (since adding one/moving them around needs a change in the serialise function as well), just endian the packed struct before serialisation.

Moving the variables around doesn't effect a class like the one I described above, but adding one definitely requires a change.

Even better, use metadata to automatically generate the format and the serialise function (sadly we don't do that where I work).

That's a great idea.

Edited by Servant of the Lord, 29 January 2013 - 08:33 PM.

It's perfectly fine to abbreviate my username to 'Servant' or 'SotL' rather than copy+pasting it all the time.
All glory be to the Man at the right hand... On David's throne the King will reign, and the Government will rest upon His shoulders. All the earth will see the salvation of God.
Of Stranger Flames -

### #15Hodgman  Moderators

Posted 29 January 2013 - 08:43 PM

Regarding data packing, serialization, platform specifics, etc, this is a good video: http://www.itshouldjustworktm.com/?p=652

### #16UnshavenBastard  Members

Posted 30 January 2013 - 03:53 AM

Wouldn't a bunch of function overloads for the different primitive types be nicer here? I.e. you'd then just write packer.Write(...)  each time, and if you change the data type of a field, it's dealt with by the compiler in the serialization code.

I write files using a class like this:
//Note: All reading and writing, except in the case of things like strings, are stored
//internally in BigEndian format (for consistent reading and write across platforms).
class BytePacker
{
public:
BytePacker();

//----------------------------------------------------------
//Writing:
//Each additional 'Write' call appends the value to the end of the data being held.
//----------------------------------------------------------
void WriteInt8(int8_t value);
void WriteInt16(int16_t value);
void WriteInt32(int32_t value);
void WriteInt64(int64_t value);

void WriteUint8(uint8_t value);
void WriteUint16(uint16_t value);
void WriteUint32(uint32_t value);
void WriteUint64(uint64_t value);

void WriteBool(bool value);
void WriteFloat(float value);
void WriteDouble(double value);
void WriteString(const std::string &str);

};

Then I do something like this:

Flash ™ - The lightning bolt that hits *your* smooth user experience, too!

-----Bel Canto Society
save old not-yet-restored Opera recordings from rotting

### #17UnshavenBastard  Members

Posted 30 January 2013 - 03:54 AM

Even better, use metadata to automatically generate the format and the serialise function (sadly we don't do that where I work).

Could you give an example of that? (what kind of meta data, where does it come from?)

Flash ™ - The lightning bolt that hits *your* smooth user experience, too!

-----Bel Canto Society
save old not-yet-restored Opera recordings from rotting

### #18Hodgman  Moderators

Posted 30 January 2013 - 04:22 AM

Even better, use metadata to automatically generate the format and the serialise function (sadly we don't do that where I work).

Could you give an example of that? (what kind of meta data, where does it come from?)
At one job, we parsed any '.h' files who's parent directory fit a naming convention. These headers just contained C struct declarations, which the parser would convert into a table of meta-data, such as field names, types, offsets, etc. You could use specially formatted comments to specify default values, valid ranges, descriptions, desired UI elements (e.g. Color picker), etc.
From that meta-data, we could then automatically generate text-to-binary serialization functions, so that all game data could be stored in a common, simple text format, but also compiled into a runtime-efficient format without effort.

I've also heard of other companies doing similar things, but via a custom language, and they also produce a C '.h' file as output for the engine to use.

Posted 30 January 2013 - 11:07 AM

Yeah, something similar to (but probably less complicated than) Microsoft's IDL (interface definition language) is worth taking a look at.
"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

### #20Servant of the Lord  Members

Posted 30 January 2013 - 01:19 PM

Wouldn't a bunch of function overloads for the different primitive types be nicer here? I.e. you'd then just write packer.Write(...)  each time, and if you change the data type of a field, it's dealt with by the compiler in the serialization code.

I've thought of that, but that may cause unexpected errors with implicit type conversion. Which overload is chosen for a time_t? A uint32_t or a sint32_t? Oh, sorry, implementation defined! It might even be a 64 bits on a 64 bit computer.

Plus, just because I have something as a 'unsigned int' in my structure or class, doesn't mean I always want it to take 32 bits in a network packet or a data file. Being explicit of the storage in this situation I think is actually a plus, though I should comment that a 'bool' I store in a single byte instead of a one bit.

I'd like to add overloads for vectors of common types also, in the same way I handle std::string. (Write the size, then read the number of elements).

I *would* like to make reading and writing functions identical. Something like this:
BytePacker bytePacker(Mode::Read or Mode::Write);

try
{
//All of myStruct's members are passed in by reference, so if Mode is Read, the data is read and the struct is written to,
//and if the Mode is Write, then the data is written to and the struct's members and read from.
bytePacker.SetInt8(myStruct.myInt8);
bytePacker.SetUint64(myStruct.key);

bytePacker.BeginEncryption(myStruct.key);

bytePacker.SetString(myStruct.text);
bytePacker.SetPoint(myStruct.position);
bytePacker.SetColor(myStruct.textColor);

myStruct.child.Serialize(bytePacker);

bytePacker.EndEncryption();

bytePacker.SetUint32(myStruct.data);

}
catch()
{

}
But that's not implemented yet.
It's perfectly fine to abbreviate my username to 'Servant' or 'SotL' rather than copy+pasting it all the time.
All glory be to the Man at the right hand... On David's throne the King will reign, and the Government will rest upon His shoulders. All the earth will see the salvation of God.
Of Stranger Flames -

Old topic!

Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.