Sign in to follow this  
Endemoniada

sizeof() not working ?!?

Recommended Posts

Hi guys, I am having a hard time understanding something right now:

 

struct FormatChunk
{
 BYTE subchunk1ID[4];
 DWORD subchunk1Size;
 short audioFormat;
};
 
DWORD dwSize=sizeof(FormatChunk); // is 12  !!!!
dwSize=sizeof(short); // is 2 like it should be

 

What the hell ?

 

Share this post


Link to post
Share on other sites
structs are usually padded to a 4 byte boundary. If you are using MSVC, you can control the packing, check out #pragma pack. If you can afford the memory, go with the usual packing though (it's more efficient to read DWORDs off 4 byte boundaries usually). Packing is important if you are worried about data file size or sending it acros a network, however.

Share this post


Link to post
Share on other sites

The best option is typically to use #pragma pack (with push/pop). This allows you to eliminate the padding entirely (using a pack of 1), and at that point the representation in memory should be identical to that on disk (barring endian issues, which you are far less likely to run into since Mac's switched to Intel).

 

However, be aware that there is a performance penalty to read/write unaligned memory - you should only pack structures directly involved in I/O.

Share this post


Link to post
Share on other sites

Nope, if you save it out use sizeof(FormatChunk) to get how many bytes to write out (and read in). Never manually try and work out the size of a struct to write to disk.

Though the padding may vary from computer to computer - for example, when sending packets from a game server to the game clients - and you'd read corrupted data.

It'd be better to just stream the data using something that is cross-platform and safe. More work, yes, but not compiler, compiler-version, compiler-settings, operating system, and hardware specific. That's just asking for unexpected and hard-to-track problems to appear when you least expect it.

I write files using a class like this:

//Note: All reading and writing, except in the case of things like strings, are stored
//internally in BigEndian format (for consistent reading and write across platforms).
class BytePacker
{
public:
    BytePacker();

    //----------------------------------------------------------
    //Writing:
    //Each additional 'Write' call appends the value to the end of the data being held.
    //----------------------------------------------------------
    void WriteInt8(int8_t value);
    void WriteInt16(int16_t value);
    void WriteInt32(int32_t value);
    void WriteInt64(int64_t value);

    void WriteUint8(uint8_t value);
    void WriteUint16(uint16_t value);
    void WriteUint32(uint32_t value);
    void WriteUint64(uint64_t value);

    void WriteBool(bool value);
    void WriteFloat(float value);
    void WriteDouble(double value);
    void WriteString(const std::string &str);

    //More stuff... including functions for reading, like std::string ReadString() and etc...

};

Then I do something like this:

//Writes the data into the byte packer.
void Tile::WriteTo(BytePacker &bytePacker)
{
    //TileImage:
    if(this->tileDisplay)
    {
        bytePacker.WriteUint32(this->tileDisplay->GetKey());
        bytePacker.WriteUint32(this->offset.ToUint32());
        bytePacker.WriteUint64(this->EightReservedBytes);

        bytePacker.WriteInt8(this->startFrame);
        
        //Store the ImageData of the tile in this Area's ImageDataCache file.
        GlobalImageDataCache.Add(this->GetImageData());
    }
    else
    {
        bytePacker.WriteUint32(0);
    }

    //TileInfo:
    this->tileInfo.WriteTo(bytePacker);
}

Some people's serializers even make the "Write" and "Read" code completely identical, and just have a single 'Serialize' function that reads if the serializer is in one mode, and writes if the serializer is in a different mode (here's an example).

Share this post


Link to post
Share on other sites
Yeah, if the packing is different that's why you need to use #pragma pack to make it agree cross platform.

And PS3 and XBox 360 need an endian swap as well.

You're still better off preparing a packed struct and endianing it than writing out individual fields (since adding one/moving them around needs a change in the serialise function as well), just endian the packed struct before serialisation. Even better, use metadata to automatically generate the format and the serialise function (sadly we don't do that where I work).

Share this post


Link to post
Share on other sites

However, be aware that there is a performance penalty to read/write unaligned memory - you should only pack structures directly involved in I/O.

It goes beyond performance penalty on some platforms. Unaligned memory access can generate a bus error that will crash your program on some processors like some ARM chips you might find in a cell phone.

Share this post


Link to post
Share on other sites

And PS3 and XBox 360 need an endian swap as well.

Handled by the BytePacker class. wink.png
 

You're still better off preparing a packed struct and endianing it than writing out individual fields (since adding one/moving them around needs a change in the serialise function as well), just endian the packed struct before serialisation.

Moving the variables around doesn't effect a class like the one I described above, but adding one definitely requires a change.

Even better, use metadata to automatically generate the format and the serialise function (sadly we don't do that where I work).

That's a great idea. Edited by Servant of the Lord

Share this post


Link to post
Share on other sites

Wouldn't a bunch of function overloads for the different primitive types be nicer here? I.e. you'd then just write packer.Write(...)  each time, and if you change the data type of a field, it's dealt with by the compiler in the serialization code.

 

 

I write files using a class like this:
//Note: All reading and writing, except in the case of things like strings, are stored
//internally in BigEndian format (for consistent reading and write across platforms).
class BytePacker
{
public:
BytePacker();

//----------------------------------------------------------
//Writing:
//Each additional 'Write' call appends the value to the end of the data being held.
//----------------------------------------------------------
void WriteInt8(int8_t value);
void WriteInt16(int16_t value);
void WriteInt32(int32_t value);
void WriteInt64(int64_t value);

void WriteUint8(uint8_t value);
void WriteUint16(uint16_t value);
void WriteUint32(uint32_t value);
void WriteUint64(uint64_t value);

void WriteBool(bool value);
void WriteFloat(float value);
void WriteDouble(double value);
void WriteString(const std::string &str);

//More stuff... including functions for reading, like std::string ReadString() and etc...

};

Then I do something like this:

Share this post


Link to post
Share on other sites


Even better, use metadata to automatically generate the format and the serialise function (sadly we don't do that where I work).


Could you give an example of that? (what kind of meta data, where does it come from?)
At one job, we parsed any '.h' files who's parent directory fit a naming convention. These headers just contained C struct declarations, which the parser would convert into a table of meta-data, such as field names, types, offsets, etc. You could use specially formatted comments to specify default values, valid ranges, descriptions, desired UI elements (e.g. Color picker), etc.
From that meta-data, we could then automatically generate text-to-binary serialization functions, so that all game data could be stored in a common, simple text format, but also compiled into a runtime-efficient format without effort.

I've also heard of other companies doing similar things, but via a custom language, and they also produce a C '.h' file as output for the engine to use.

Share this post


Link to post
Share on other sites

Wouldn't a bunch of function overloads for the different primitive types be nicer here? I.e. you'd then just write packer.Write(...)  each time, and if you change the data type of a field, it's dealt with by the compiler in the serialization code.

 
I've thought of that, but that may cause unexpected errors with implicit type conversion. Which overload is chosen for a time_t? A uint32_t or a sint32_t? Oh, sorry, implementation defined! It might even be a 64 bits on a 64 bit computer.

Plus, just because I have something as a 'unsigned int' in my structure or class, doesn't mean I always want it to take 32 bits in a network packet or a data file. Being explicit of the storage in this situation I think is actually a plus, though I should comment that a 'bool' I store in a single byte instead of a one bit.

I'd like to add overloads for vectors of common types also, in the same way I handle std::string. (Write the size, then read the number of elements).

I *would* like to make reading and writing functions identical. Something like this:
BytePacker bytePacker(Mode::Read or Mode::Write);

try
{
	//All of myStruct's members are passed in by reference, so if Mode is Read, the data is read and the struct is written to,
	//and if the Mode is Write, then the data is written to and the struct's members and read from.
	bytePacker.SetInt8(myStruct.myInt8);
	bytePacker.SetUint64(myStruct.key);

	bytePacker.BeginEncryption(myStruct.key);
	
	bytePacker.SetString(myStruct.text);
	bytePacker.SetPoint(myStruct.position);
	bytePacker.SetColor(myStruct.textColor);
	
	myStruct.child.Serialize(bytePacker);
	
	bytePacker.EndEncryption();
	
	bytePacker.SetUint32(myStruct.data);

}
catch()
{


}
But that's not implemented yet.

Share this post


Link to post
Share on other sites

Last time I needed to serialize data to send and receive messages, I ended up building a stream like container and forwarding one operator (&, as inspired by boost) to either << or >> depending on stream type. So each message had one function that would stuff all members into the container or get it out. A few overloads for C-strings, arrays, etc., however no handling of endianness (at least it was never used, there was a very inefficient template to reverse byte order).

 

It wasn't bullet proof and because of all the templates the user could still put in entire structs (so it was his job to worry about padding). The core of each "stream" was essentially something like

 

template <typename T>
OutStream& operator<<(const T& data)
{
    buffer.resize( buffer.size() + sizeof(T) );
    memcpy(buffer + pos, &data, sizeof(T));
    pos += sizeof(T);
}
 
//some overloads of <<
 
//same for InStream and >>
 
template<typename T>
inline InStream& operator&(InStream& stream, T& data) { return stream >> data; }
 
 
template<typename T>
inline OutStream& operator&(OutStream& stream, T& data) { return stream << data; }

 

For each message there was a function like

 

template<class StreamType>
bool packUnpack(StreamType& stream)
{
     stream & member1 & member2 & member3 & member4;
     return !stream.fail();
}

 

Adding a member still required to change that function, but reading/writing was consistent, sizes were determined automatically (so using types with fixed size is important) and it worked well enough for what it was used for.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this