Is dumping classes to a file completely unsafe?

Started by
4 comments, last by silverphyre673 16 years, 10 months ago
I have a few structures with several elements that I want to save to and load from a binary file using C++ and the STL ifstream and ofstream objects. Right now, I'm saving and loading each individual member object of the classes in question, which works but makes it tedious to write new save/load code and makes it difficult to read. None of the classes I'm talking about have any dynamically-resizeable arrays. They've just got a lot of integer values. I'll give you a trivial example of the sort of thing I am currently doing, and what I'd like to do: What I'm doing now:

struct Foo
{
    int arg1;
    char arg2;
    float arg3;
};

void save(Foo & foo)
{
    ofstream fout("foo.dat", ios::binary);

    fout.write((char*)&foo.arg1, sizeof(foo.arg1));
    fout.write((char*)&foo.arg2, sizeof(foo.arg2));
    fout.write((char*)&foo.arg3, sizeof(foo.arg3));
    fout.close();
}

void load(Foo & foo)
{
    ifstream fin("foo.dat", ios::binary);

    fin.read((char*)&foo.arg1, sizeof(foo.arg1));
    fin.read((char*)&foo.arg2, sizeof(foo.arg2));
    fin.read((char*)&foo.arg3, sizeof(foo.arg3));
    fin.close();
}

What I'd like to do:

struct Foo
{
    int arg1;
    char arg2;
    float arg3;
};

void save(Foo & foo)
{
    ofstream fout("foo.dat", ios::binary);

    fout.write((char*)&foo, sizeof(Foo));
    fout.close();
}

void load(Foo & foo)
{
    ifstream fin("foo.dat", ios::binary);

    fin.read((char*)&foo, sizeof(Foo));
    fin.close();
}

my siteGenius is 1% inspiration and 99% perspiration
Advertisement
Writing the entire structure directly as you're suggesting is dangerous. The compiler can (and usually will) insert padding between structure fields for efficiency purposes. You will be writing this padding out. Since the nature and amount of padding is compiler specific, if you recompile your code with a different compiler, or event different version of the same compiler, you might end up with incompatible data. Furthermore, if the structure in question has a vtable or other compiler-specific embedded data (which is legal), you will write and read that too -- which will destroy the object since the vtable pointer most certainly will not be valid. You also don't have control over endianess issues, which can lead to further data incompatibility.

While tedious, it is usually better to write the data member-by-member, or use a serialization library or other tool to generate the serialization code for you (I used to use a Python script to pre-generate serialization methods for objects by examining the header files, which contained commands embedded in comments, for example -- before I switched to C# for primary development).

In your case you may be able to instruct the compiler to pack the object tightly and avoid padding, and you generally don't need to worry about endianness issues until you are writing a cross-platform application. Nonetheless, I'd encourage you investigate a more robust solution (perhaps Boost's serialization library?)
For POD (Plain old data) types the second way will work*.

POD types have no constructors, destructor or virtual functions, nor any pointer/reference based members. All of their members must be primitive types or be a POD type themselves.

* with jpetrie's notes of changing compiler versions.
I don't think it's completely unsafe. That said, I probably wouldn't do it. I'd expect it to work as long as the contents are POD and if you use the same stream implementation and build with the same compiler and run both on the same machine. Any variation from that and I'd expect it to sometimes work and sometimes fail catastrophically (or always fail catastrophically depending on the non-POD you use).

[edit: beaten!]
Your problem has to deal with "wrong" choice of serialization parameters.

You'll want inversion of control for this. Pass the serializer as parameter to the object you to serialize, or to serialization function if those are standalone (they need friend access for classes then)

template < class Serializer >void save( Serializer &s, int x ) {  s << x;}template < class Serializer >void save( Serializer &s, char x ) {  s << x;}// and so on for elementary typestemplate < class Serializer >void save( Serializer &s, const Foo &foo ){  save( s, foo.arg1 );  save( s, foo.arg2 );  save( s, foo.arg3 );}


Now consider adding a more complex type which contains foo:

class Bar{  Foo f;  int x;}template < class Serializer >void save( Serializer &s, const Bar &bar ){  save( s, bar.f );  save( s, foo.x );}


You call this whole thing like this:
Bar bar;FileSerializer s("some_file");save( s, bar );s.close();


Templating the serializer isn't necessary. It's here just to demonstrate complete decoupling from serialization mechanics, where implementation of serialization doesn't matter, the only thing you do need to provide are the << and >> operators for common types.


Here you can then serialize arbitrarily composed classes. Using operator overloading it also becomes possible to reduce the syntax further (see boost serialization for more examples.

Ultimately, it's possible to reduce entire serialization declaration to:
class Foo {...template < class Archive >void serialize( Archive &archive ){  archive & arg1 & arg2 & arg3;}...}


But it does take some effort to get all the tiny details right, so using looking into boost might be a good idea for robust and flexible serialization.
Thanks a bunch! I'm using boost extensively for this project anyway, but I didn't know about the serializer, never having had this issue before. I'll just use that.
my siteGenius is 1% inspiration and 99% perspiration

This topic is closed to new replies.

Advertisement