"Canonical Forms" for Save/Load/Save As

Started by
4 comments, last by Antheus 15 years, 2 months ago
Hello, I've been wondering if there's an accepted "design pattern" for implementing save and load mechanics in a windows based data centered application (I'm working with Windows Forms and C# but the language is irrelevant here).I'll post the way I usually do it either way, but any feedback is appreciated. Let's say we have a Document class that encapsulates all the data that needs to be saved or loaded. Several questions come to my mind when trying to implement this. I'll enumerate what I'm able to recall at the moment, but I'd be very thankful if someone could post or link to a complete article on the matter. On the Document class: - Should the Load function be a static factory method that returns a new document, or a member method that overwrites the current state of the object with the values on disk? - Should the Save function take a string argument defining where to save, or should the Document object itself store a string value stating the path, and make the Save function use this value? - I usually keep a boolean flag dictating wether the document has been modified or not since the last save. I check this flag when closing the application form for instance. This flag is set to false everytime Save() is successfully completed, and to true everytime a non-const method is called on the document. Is this the correct line of thought? On the Application class: - I keep a reference (or a collection of references in case of an application that allows more than one document at the same time) to the current open document, which is obtained either by normal object construction when the New option is chosen, or returned by the Load static factory method of Document, when Open is called. - Save first checks wether the document has a path associated with it. If it doesn't have, it delegates it's work to Save As. Otherwise it relies on the document member Save function. In sum, in some sort of pseudo code:
class Document 
{
    string path;
    bool modified;

    static Document Load(string path);
    void Save();

   // Other data
};

class Application
{
   Document currentDocument;

   void Save()
   {
       if(currentDocument.path != "")
           SaveAs();
       else
           currentDocument.Save();
   }

   void SaveAs()
   {
        currentDocument.path = FileDialog.Result();
        currentDocument.Save();
   }

   void New()
   {
       If(currentDocument != null) {
           // Do some sort of confirmation dialog
           if(cancel) return;
           currentDocument.Close();
       }
       
       currentDocument = new Document();
   }

   void Load()
   {
       If(currentDocument != null) {
           // Do some sort of confirmation dialog
           if(cancel) return;
           currentDocument.Close();
       }
       
       string path = FileDialog.Result();
       currentDocument = Document.Load(path);
   }
}

Advertisement
The parts you mention are generally non problematic.

Saving however is subject to several other pitfalls.

Quote:- Should the Load function be a static factory method that returns a new document, or a member method that overwrites the current state of the object with the values on disk?


My guideline for static is that it needs to operate over immutable (readonly or constants) state.

In general, RAII is preferred, so if you open a new document, create a whole new document from scratch. This propagates well to asynchronous/multi-threaded design. For example, if you find loading takes too long, you start loading in different thread, or via callback. When file is completely loaded, you end up with valid document.

If you were to modify existing document, you end up with potentially invalid or at least non-deterministic state while loading is in progress. It makes error checking a nightmare. Still, sometimes a lot of data can be re-used, so completely discarding old state isn't optimal.

Quote:- Should the Save function take a string argument defining where to save, or should the Document object itself store a string value stating the path, and make the Save function use this value?


Save function needs to know how to write current document. If you're opening a document for writing, then you might as well open it in some exclusive mode, since you don't want others to change it concurrently. So you might as well store the handle or whatever method you're using to access the file.

Quote:- I usually keep a boolean flag dictating wether the document has been modified or not since the last save. I check this flag when closing the application form for instance. This flag is set to false everytime Save() is successfully completed, and to true everytime a non-const method is called on the document. Is this the correct line of thought?


This depends on how you represent the document internally. Often, documents are represented via change log. In that case, you would check the log to see it it contains any state-mutating changes. This is then coupled with undo mechanism, so the whole process becomes somewhat more elaborate. Whether it's worth it depends.

void Save()void SaveAs


I would design document loading/saving differently. I'd define it via streams only. Something like:
class Document {  Document();  Document(StreamReader reader);  void save(StreamWriter writer);}
This gives you default empty document, a way to open an existing document, and a way to store this document.

Then move all the actual file details elsewhere. This has less coupling to actual IO. A document doesn't care if it's loaded from a new file, from network, from some archive or from memory.

This also changes the way factories are used. For example:
StreamReader getDocumentSource(string s);Document createDocument(StringReader r);
Thanks for the input Antheus.
While analysing your post and seeing how it can be applied to my application, I stumbled upon a problem:

Regarding the following approach of having Load() made through the document constructor.

Quote:
class Document {  Document();  Document(StreamReader / FilePath);}



I was initially going with this design, but I'm using an XmlSerializer to perform the save and load for the document, and the Deserialize method on the XmlSerializer is a factory that returns an already fully built object. I naively tried assigning the returned object to "this", but as I should have known better, that doesn't work. If I want to keep using XmlSerializer, do I have to keep with the static Load method, or is there a way to move that logic to the class constructor?

This particular application doesn't need the flexibility to be able to read from different sources, so I think I'll go with your suggestion to make the document keep an handle to the file. I'll also take those pitfalls you linked to into account in my design.

Quote:
In general, RAII is preferred, so if you open a new document, create a whole new document from scratch. This propagates well to asynchronous/multi-threaded design. For example, if you find loading takes too long, you start loading in different thread, or via callback. When file is completely loaded, you end up with valid document.

If you were to modify existing document, you end up with potentially invalid or at least non-deterministic state while loading is in progress. It makes error checking a nightmare. Still, sometimes a lot of data can be re-used, so completely discarding old state isn't optimal.


Most of my experience is with C++, so I'd like the input from someone familiar with C# on this one. I just started working with C#, and now that you mention RAII, I'm wondering how I should I design my document class so that it properly deals with resource release. I was thinking of adding a Close() method to my Document class and calling it explicitally before any New or Load call on my Application, because (or so I believe to the extent of my current knowledge) destructors can't be called explicitally in C#. But that's obviously potentially unsafe, and there should be a better way to deal with it... Ideally what I would like is to have the following code in C#:

Document currentDocument;currentDocument = new Document();// other codecurrentDocument = new Document("file.xml");


And have the second object construction to automatically make the first one dispose of itself... but with GC and all, is that possible?
Quote:Original post by OrpheusTheBard

now that you mention RAII, I'm wondering how I should I design my document class so that it properly deals with resource release.


Managed languages take care of release - that's the biggest benefit. The point of RAII in this context is to ensure new instances end up in fully initialized state, or you end up with an exception. But due to managed nature, you only need to provide the creation part.

Quote:And have the second object construction to automatically make the first one dispose of itself... but with GC and all, is that possible?


All instances are disposed automatically due to nature of managed language. So by replacing one instance with another you mark it for release.


For explicit release, there's using keyword, but that is typically used for other purposes.

Quote:If I want to keep using XmlSerializer, do I have to keep with the static Load method, or is there a way to move that logic to the class constructor?


The constructor there was for symmetry. If you use XmlSerializer then it takes the role of Reader constructor which isn't needed anymore.

You would not call constructor manually anyway, it would be used by document factory. So as far as usage is concerned, all that changes is that internally that method now calls XmlSerializer to create new instance of an object.
This "safe" saving protocol
Quote:1) Open with CreateAlways (so it's truncated) a file named "MyFile.tmp"
2) Write data to MyFile.tmp, and close/flush
3) Remove the original "MyFile.dat" file
4) Rename "MyFile.tmp" to "MyFile.dat"

is not really safe in that it doesn't provide rollback semantics.
Assume 4 fails, for example. You didn't fully save the data, and you lost the previous state.

To provide the strong guarantee, you need an atomic rename-and-replace function.
Which is actually the semantics of POSIX rename... On Win32, I don't know though.
Quote:Original post by loufoque
is not really safe in that it doesn't provide rollback semantics.
Assume 4 fails, for example. You didn't fully save the data, and you lost the previous state.


You don't lose anything, the state is fully saved, just mis-labeled.

This is quite important, since rename, even if not atomic replace, doesn't change that new file is in deterministic state. This prevents situations such as writing over existing file and running out of disk space half-way through.

If writing to new file fails, there is still the intact old copy, and the error condition is detectable, so user gets the chance to write to different location.

Quote:To provide the strong guarantee, you need an atomic rename-and-replace function.
Which is actually the semantics of POSIX rename... On Win32, I don't know though.


Win32 offers some form of atomic replace, but doesn't guarantee it, at least according to documentation.

But the most important part of this process is that data is not lost. It's much easier to deal with customer when you ask them to look for .tmp file, rather than telling them that their only copy is corrupted.

This topic is closed to new replies.

Advertisement