Jump to content

  • Log In with Google      Sign In   
  • Create Account

We're offering banner ads on our site from just $5!

1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


sizeof() not working ?!?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
20 replies to this topic

#1 Endemoniada   Members   -  Reputation: 312

Like
0Likes
Like

Posted 29 January 2013 - 04:37 PM

Hi guys, I am having a hard time understanding something right now:

 

struct FormatChunk
{
 BYTE subchunk1ID[4];
 DWORD subchunk1Size;
 short audioFormat;
};
 
DWORD dwSize=sizeof(FormatChunk); // is 12  !!!!
dwSize=sizeof(short); // is 2 like it should be

 

What the hell ?

 



Sponsor:

#2 ApochPiQ   Moderators   -  Reputation: 16078

Like
7Likes
Like

Posted 29 January 2013 - 04:40 PM

You have 4 BYTEs, a DWORD (4 more bytes) and a short (2 bytes). Including structure padding, this gives 12 bytes.

#3 SiCrane   Moderators   -  Reputation: 9628

Like
7Likes
Like

Posted 29 January 2013 - 04:40 PM

What you're seeing is data alignment at work. Basically your DWORD needs to be allocated on a multiple of four bytes, so the size of the struct needs to be a multiple of four bytes.

#4 Paradigm Shifter   Crossbones+   -  Reputation: 5410

Like
2Likes
Like

Posted 29 January 2013 - 04:42 PM

structs are usually padded to a 4 byte boundary. If you are using MSVC, you can control the packing, check out #pragma pack. If you can afford the memory, go with the usual packing though (it's more efficient to read DWORDs off 4 byte boundaries usually). Packing is important if you are worried about data file size or sending it acros a network, however.
"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

#5 Endemoniada   Members   -  Reputation: 312

Like
0Likes
Like

Posted 29 January 2013 - 04:57 PM

Ahh, ok guys, I thought I was going crazy.

 

I'm going to read up on it.

 

In general though, if I have to read 10 bytes into that struct I can't rely on sizeof() and should explicitly set it to read 10, is that right ?

 

Thanks.

 



#6 L. Spiro   Crossbones+   -  Reputation: 14026

Like
5Likes
Like

Posted 29 January 2013 - 05:07 PM

If your structure is meant to be used to map in-memory file data you should enforce finer control over the alignment/padding of the structure with #pragma pack where available.

If not available, you should approach the problem in a different way period.

If your byte array was 3 bytes instead of 4 bytes in length, subchunk1Size would still be aligned to 4 bytes, with an extra byte added before it for padding.

Padding does not occur just at the end of structures, but also between members inside the structure, so if you can’t strictly control how it is padded, don’t even try to use that approach.

 

So no, don’t hard-code 10 anywhere.  You should use sizeof(), and if that is not reliable then don’t use this method at all.

 

 

L. Spiro


Edited by L. Spiro, 29 January 2013 - 06:51 PM.

It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums

#7 Paradigm Shifter   Crossbones+   -  Reputation: 5410

Like
0Likes
Like

Posted 29 January 2013 - 05:07 PM

Nope, if you save it out use sizeof(FormatChunk) to get how many bytes to write out (and read in). Never manually try and work out the size of a struct to write to disk.
"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

#8 Adam_42   Crossbones+   -  Reputation: 2573

Like
1Likes
Like

Posted 29 January 2013 - 06:30 PM

Your safest option is to read and write each element individually.

 

sizeof(subchunk1ID) + sizeof(subchunk1Size) +  sizeof(audioFormat) will always be 10 regardless of any structure padding.



#9 Paradigm Shifter   Crossbones+   -  Reputation: 5410

Like
3Likes
Like

Posted 29 January 2013 - 06:33 PM

Yikes, that's a maintenance nightmare if you add a field to the struct or change the order of them.
"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

#10 swiftcoder   Senior Moderators   -  Reputation: 10242

Like
0Likes
Like

Posted 29 January 2013 - 06:44 PM

The best option is typically to use #pragma pack (with push/pop). This allows you to eliminate the padding entirely (using a pack of 1), and at that point the representation in memory should be identical to that on disk (barring endian issues, which you are far less likely to run into since Mac's switched to Intel).

 

However, be aware that there is a performance penalty to read/write unaligned memory - you should only pack structures directly involved in I/O.


Tristam MacDonald - Software Engineer @Amazon - [swiftcoding]


#11 Servant of the Lord   Crossbones+   -  Reputation: 20364

Like
2Likes
Like

Posted 29 January 2013 - 07:40 PM

Nope, if you save it out use sizeof(FormatChunk) to get how many bytes to write out (and read in). Never manually try and work out the size of a struct to write to disk.

Though the padding may vary from computer to computer - for example, when sending packets from a game server to the game clients - and you'd read corrupted data.

It'd be better to just stream the data using something that is cross-platform and safe. More work, yes, but not compiler, compiler-version, compiler-settings, operating system, and hardware specific. That's just asking for unexpected and hard-to-track problems to appear when you least expect it.

I write files using a class like this:

//Note: All reading and writing, except in the case of things like strings, are stored
//internally in BigEndian format (for consistent reading and write across platforms).
class BytePacker
{
public:
    BytePacker();

    //----------------------------------------------------------
    //Writing:
    //Each additional 'Write' call appends the value to the end of the data being held.
    //----------------------------------------------------------
    void WriteInt8(int8_t value);
    void WriteInt16(int16_t value);
    void WriteInt32(int32_t value);
    void WriteInt64(int64_t value);

    void WriteUint8(uint8_t value);
    void WriteUint16(uint16_t value);
    void WriteUint32(uint32_t value);
    void WriteUint64(uint64_t value);

    void WriteBool(bool value);
    void WriteFloat(float value);
    void WriteDouble(double value);
    void WriteString(const std::string &str);

    //More stuff... including functions for reading, like std::string ReadString() and etc...

};

Then I do something like this:

//Writes the data into the byte packer.
void Tile::WriteTo(BytePacker &bytePacker)
{
    //TileImage:
    if(this->tileDisplay)
    {
        bytePacker.WriteUint32(this->tileDisplay->GetKey());
        bytePacker.WriteUint32(this->offset.ToUint32());
        bytePacker.WriteUint64(this->EightReservedBytes);

        bytePacker.WriteInt8(this->startFrame);
        
        //Store the ImageData of the tile in this Area's ImageDataCache file.
        GlobalImageDataCache.Add(this->GetImageData());
    }
    else
    {
        bytePacker.WriteUint32(0);
    }

    //TileInfo:
    this->tileInfo.WriteTo(bytePacker);
}

Some people's serializers even make the "Write" and "Read" code completely identical, and just have a single 'Serialize' function that reads if the serializer is in one mode, and writes if the serializer is in a different mode (here's an example).


It's perfectly fine to abbreviate my username to 'Servant' rather than copy+pasting it all the time.
All glory be to the Man at the right hand... On David's throne the King will reign, and the Government will rest upon His shoulders. All the earth will see the salvation of God.
Of Stranger Flames - [indie turn-based rpg set in a para-historical French colony] | Indie RPG development journal

[Fly with me on Twitter] [Google+] [My broken website]

[Need web hosting? I personally like A Small Orange]


#12 Paradigm Shifter   Crossbones+   -  Reputation: 5410

Like
0Likes
Like

Posted 29 January 2013 - 07:45 PM

Yeah, if the packing is different that's why you need to use #pragma pack to make it agree cross platform.

And PS3 and XBox 360 need an endian swap as well.

You're still better off preparing a packed struct and endianing it than writing out individual fields (since adding one/moving them around needs a change in the serialise function as well), just endian the packed struct before serialisation. Even better, use metadata to automatically generate the format and the serialise function (sadly we don't do that where I work).
"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

#13 SiCrane   Moderators   -  Reputation: 9628

Like
2Likes
Like

Posted 29 January 2013 - 08:09 PM

However, be aware that there is a performance penalty to read/write unaligned memory - you should only pack structures directly involved in I/O.

It goes beyond performance penalty on some platforms. Unaligned memory access can generate a bus error that will crash your program on some processors like some ARM chips you might find in a cell phone.

#14 Servant of the Lord   Crossbones+   -  Reputation: 20364

Like
1Likes
Like

Posted 29 January 2013 - 08:32 PM

And PS3 and XBox 360 need an endian swap as well.

Handled by the BytePacker class. wink.png
 

You're still better off preparing a packed struct and endianing it than writing out individual fields (since adding one/moving them around needs a change in the serialise function as well), just endian the packed struct before serialisation.

Moving the variables around doesn't effect a class like the one I described above, but adding one definitely requires a change.

Even better, use metadata to automatically generate the format and the serialise function (sadly we don't do that where I work).

That's a great idea.

Edited by Servant of the Lord, 29 January 2013 - 08:33 PM.

It's perfectly fine to abbreviate my username to 'Servant' rather than copy+pasting it all the time.
All glory be to the Man at the right hand... On David's throne the King will reign, and the Government will rest upon His shoulders. All the earth will see the salvation of God.
Of Stranger Flames - [indie turn-based rpg set in a para-historical French colony] | Indie RPG development journal

[Fly with me on Twitter] [Google+] [My broken website]

[Need web hosting? I personally like A Small Orange]


#15 Hodgman   Moderators   -  Reputation: 31112

Like
3Likes
Like

Posted 29 January 2013 - 08:43 PM

Regarding data packing, serialization, platform specifics, etc, this is a good video: http://www.itshouldjustworktm.com/?p=652



#16 UnshavenBastard   Members   -  Reputation: 331

Like
0Likes
Like

Posted 30 January 2013 - 03:53 AM

Wouldn't a bunch of function overloads for the different primitive types be nicer here? I.e. you'd then just write packer.Write(...)  each time, and if you change the data type of a field, it's dealt with by the compiler in the serialization code.

 

 

I write files using a class like this:
//Note: All reading and writing, except in the case of things like strings, are stored
//internally in BigEndian format (for consistent reading and write across platforms).
class BytePacker
{
public:
BytePacker();

//----------------------------------------------------------
//Writing:
//Each additional 'Write' call appends the value to the end of the data being held.
//----------------------------------------------------------
void WriteInt8(int8_t value);
void WriteInt16(int16_t value);
void WriteInt32(int32_t value);
void WriteInt64(int64_t value);

void WriteUint8(uint8_t value);
void WriteUint16(uint16_t value);
void WriteUint32(uint32_t value);
void WriteUint64(uint64_t value);

void WriteBool(bool value);
void WriteFloat(float value);
void WriteDouble(double value);
void WriteString(const std::string &str);

//More stuff... including functions for reading, like std::string ReadString() and etc...

};

Then I do something like this:


Flash ™ - The lightning bolt that hits *your* smooth user experience, too!

 

-----Bel Canto Society
save old not-yet-restored Opera recordings from rotting


#17 UnshavenBastard   Members   -  Reputation: 331

Like
0Likes
Like

Posted 30 January 2013 - 03:54 AM

Even better, use metadata to automatically generate the format and the serialise function (sadly we don't do that where I work).



Could you give an example of that? (what kind of meta data, where does it come from?)

Flash ™ - The lightning bolt that hits *your* smooth user experience, too!

 

-----Bel Canto Society
save old not-yet-restored Opera recordings from rotting


#18 Hodgman   Moderators   -  Reputation: 31112

Like
2Likes
Like

Posted 30 January 2013 - 04:22 AM


Even better, use metadata to automatically generate the format and the serialise function (sadly we don't do that where I work).


Could you give an example of that? (what kind of meta data, where does it come from?)
At one job, we parsed any '.h' files who's parent directory fit a naming convention. These headers just contained C struct declarations, which the parser would convert into a table of meta-data, such as field names, types, offsets, etc. You could use specially formatted comments to specify default values, valid ranges, descriptions, desired UI elements (e.g. Color picker), etc.
From that meta-data, we could then automatically generate text-to-binary serialization functions, so that all game data could be stored in a common, simple text format, but also compiled into a runtime-efficient format without effort.

I've also heard of other companies doing similar things, but via a custom language, and they also produce a C '.h' file as output for the engine to use.

#19 Paradigm Shifter   Crossbones+   -  Reputation: 5410

Like
0Likes
Like

Posted 30 January 2013 - 11:07 AM

Yeah, something similar to (but probably less complicated than) Microsoft's IDL (interface definition language) is worth taking a look at.
"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

#20 Servant of the Lord   Crossbones+   -  Reputation: 20364

Like
1Likes
Like

Posted 30 January 2013 - 01:19 PM

Wouldn't a bunch of function overloads for the different primitive types be nicer here? I.e. you'd then just write packer.Write(...)  each time, and if you change the data type of a field, it's dealt with by the compiler in the serialization code.

 
I've thought of that, but that may cause unexpected errors with implicit type conversion. Which overload is chosen for a time_t? A uint32_t or a sint32_t? Oh, sorry, implementation defined! It might even be a 64 bits on a 64 bit computer.

Plus, just because I have something as a 'unsigned int' in my structure or class, doesn't mean I always want it to take 32 bits in a network packet or a data file. Being explicit of the storage in this situation I think is actually a plus, though I should comment that a 'bool' I store in a single byte instead of a one bit.

I'd like to add overloads for vectors of common types also, in the same way I handle std::string. (Write the size, then read the number of elements).

I *would* like to make reading and writing functions identical. Something like this:
BytePacker bytePacker(Mode::Read or Mode::Write);

try
{
	//All of myStruct's members are passed in by reference, so if Mode is Read, the data is read and the struct is written to,
	//and if the Mode is Write, then the data is written to and the struct's members and read from.
	bytePacker.SetInt8(myStruct.myInt8);
	bytePacker.SetUint64(myStruct.key);

	bytePacker.BeginEncryption(myStruct.key);
	
	bytePacker.SetString(myStruct.text);
	bytePacker.SetPoint(myStruct.position);
	bytePacker.SetColor(myStruct.textColor);
	
	myStruct.child.Serialize(bytePacker);
	
	bytePacker.EndEncryption();
	
	bytePacker.SetUint32(myStruct.data);

}
catch()
{


}
But that's not implemented yet.
It's perfectly fine to abbreviate my username to 'Servant' rather than copy+pasting it all the time.
All glory be to the Man at the right hand... On David's throne the King will reign, and the Government will rest upon His shoulders. All the earth will see the salvation of God.
Of Stranger Flames - [indie turn-based rpg set in a para-historical French colony] | Indie RPG development journal

[Fly with me on Twitter] [Google+] [My broken website]

[Need web hosting? I personally like A Small Orange]





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS