Sign in to follow this  
  • entries
    195
  • comments
    198
  • views
    103874

storing data

Sign in to follow this  
SiCrane

140 views

One of my projects is the personal wiki program. Now, I'm just thinking "out loud" here, stream of consciousness style since I was a bum and haven't worked this out in my design yet. So I was thinking about storage mechanisms.

From what I've seen of other wiki-like software there's two common choices: use a database or a flat file system with a file per page.

A flat file system seems to have the advantage that you can mess with the files with an external editor or reader.

On the other hand with a database back-end it seems easier to store metadata/relational information.

Since the infodump project that I plan to use the personal wiki with will potentially run thousands of pages, a flat file system might be awkward to work with. I'm not worried about name collisions, since each page would have a unique identifier (otherwise they wouldn't fit in a wiki framework), but somethings like properly escaping page names into file names could be a hassle. (Of course, in the amount of time I spent thinking about this I could probably have banged out the code to escape the names properly. But that would be more code to maintain and debug later.)

Actually, I'm jumping ahead here. Basically, since there may be thousands of pages, storing all the page data and meta data in memory at once would be somewhat silly. So the rest can be written to secondary storage (with possible caching of individual pages).

The only thing that needs to be stored per page would be the raw page data. I think at a minimum this includes the original data as entered by the user originally. (As compared with storing a transformed, viewable version that gets translated back into the original mark up. It pisses me off when trying to edit a entry doesn't reveal the original code I used to create the entry, like on some message boards.) Additionally, this might include storage of relational data (forward and backward links) and the transformed viewable information.

Storing the raw page data in a flat file system would be feasible, but most techniques I can think for getting this to work would require either a heck of a lot of reparsing at run time (running potentially quadratic or higher time when doing data export). A annotated version might require a convoluted file format.

Actually, at this point I'm still trying to figure out why some wiki projects regard a flat file system as a feature. Maybe it will come to me later.

Another option is just to work with the data all in memory and rely on the .NET serialziation facilities to read and write the whole mess all at once. But this also seems bad with respect to the thousands of pages in the project thing.

Eh. But don't listen to me, I don't know what I'm talking about.
Sign in to follow this  


2 Comments


Recommended Comments

This system sounds remarkably how I've coded up my articles pages on the evolutional website (link). I have a raw custom markup format for the content (very wiki-like) which is then transformed into XHTML for viewing. Currently, I'm using a combination of the two methods for storage (Xml and a SQL database) and I must say, they both have benefits over each other. The flat files allow you to alter the content by hand as well as perform easy backup / restores by enabling you to copy the files across. As the files are already Xml, transforming them to a new form via XSLT is simple. However, flat files suffer from multi-user problems as you will have to build it into the software to create mutexes and the like. SQL on the otherhand has all this built in in the form of Transactions (provided you have an ACID-enabled RDBMS). With a database it'll be a lot easier to manage generations of updates too, you can insert your new update as a table row and alter the current page definition row to reflect which version is currently active. I've found that flatfiles work well for static or seldom-changed data (menus, for example) and that the SQL database works well for dynamic and generational content. I don't know if this is in any way useful, but I thought I'd throw it in anyway. [smile]

Share this comment


Link to comment
Well, to put things in context, it's a personal information storage/editing system done in a wiki style, so there should only be one user at a time. This makes concurrency issues a non-issue (yay! no mutexes). The internal format is completely flexible; what the application uses as its secondary storage doesn't need to interact with anything. However, I plan for an export option to either HTML or XML (probably directly to HTML).

As such, there isn't much in the way of static information that needs to be stored by the program itself. Template information, such as menus and other decoration is not attached per page; it only exists in the outputted HTML.

Basically, at this point I see no real reason not to use a full fledged database system as the backend. Of course, this could just be the higher level object mindset - when I'm programming in C# I want to deal with as few details as possible. And manually managing individual files seems like unnecessary detail.

Share this comment


Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now