Markup-Language Data Storage

Started by
9 comments, last by Mithrandir 21 years, 1 month ago
I''ve been experimenting around with the idea of an extensible markup-language type data storage format (like XML). I really like the idea a lot, but I keep running into a problem: how do you prevent the need to completely-rewrite an xml file to disk when you modify it? Basically the only way I can see around it is to store the frequently-modified data into small files, and the larger static data can still be kept in a huge xml database. Anyone have any insights they want to share?
This is my signature. There are many like it, but this one is mine. My signature is my best friend. It is my life. I must master it as I must master my life. My signature, without me, is useless. Without my signature, I am useless.
Advertisement
My only insight is this: XML is not really a good ''in-memory'' format, it''s a transfer format. It''s fairly large and slow to use in-memory. So if you really need to keep your in-memory representation in sync with the on-disk version, it is probably not the best choice. All elements and attributes have the same level of importance, whereas in most real applications you might provide special accessors to certain values to make your life easier. As a result of all this, I only use XML for saving and loading. It''s certainly not the magic bullet that IT managers and book publishers are trying to make it out to be.

But if you really want to use XML, there''s certainly no reason why you can''t have more than 1 file of it, and migrate the frequently-modified aspects out to separate files, or even into a non-XML format.

[ MSVC Fixes | STL | SDL | Game AI | Sockets | C++ Faq Lite | Boost | Asking Questions | Organising code files | My stuff ]
yes, in my opinion plain XML/HTML text is a pure waste of space. but i encourage you check out DOM (Document Object Model i think). it''s a way of storing XML elements and their attributes in a graph-/tree-like structure so you can do a search/insert/modify/delete/whatever on the elements pretty fast and easy. i don''t know of any free libraries but if you have a working xml-parser it should be easy to write your own DOM implementation (there''s a documentation somewhere on the web which includes all interfaces you''ll need to implement)

hope this helps
Right, I never even considered keeping XML data in memory; that''s just wasteful.

My main concern here is

A) a human readable format, so editors can be made easily
B) an easily extensible format


My decision to use this format hasn''t been finalized yet, I''m still exploring my options. By and large, my biggest problem with it is the need to overwrite the entire file upon modification.

And I was thinking of using a tree of std::map''s for the XML structures while they are briefly in memory.
This is my signature. There are many like it, but this one is mine. My signature is my best friend. It is my life. I must master it as I must master my life. My signature, without me, is useless. Without my signature, I am useless.
Well, using the DOM with XML is what you have to do, really. There''s no choice about it - XML yields a tree-like structure. Searching through the DOM tree which is stored as plain text is pretty slow compared to using C++ pointers and structures. But to be honest that''s not what puts me off: it''s the slightly awkward interface you tend to have. I''ve not found an XML parser for C++ that has access anything near as clean as an std::map or whatever. (Although, I hope to write an extension to TinyXML soon to help with this... nested std::multimaps should work well in some respects.)

How often do you expect write to write the changes out? Every second? And how much data do you have? Writing the file quite often might not be as much of a problem as you think.

[ MSVC Fixes | STL | SDL | Game AI | Sockets | C++ Faq Lite | Boost | Asking Questions | Organising code files | My stuff ]
Mith,

The big problem you are going to encounter is the re-writing of data. What it boils down to, is you will either need a fixed-width structure that is easily replaced with seek/write calls, or an index-based format (or a variation).

The index format could be a set of pointers at the beginning of the file, mapping the structure id to a location in the file. When it comes time to update a structure, if its length is <= its current length, overwrite it in place, otherwise store it at the end and update the pointers. Now you can also keep a list of pointers to free space, and try and fit structures in them, this will help keep the file compact.

In either case, you are basically going to be re-inventing databases and memory allocation algorithms. It may be fruitful to do some background research on techniques used in those fields to get some ideas how to implement it.

I''m wondering if it would be a good idea for someone to come up with a general-purpose library that can be included in an app to do basic file manipulation like this. I know in the game I am working on I could use something like that, as there are so many dynamic attributes that objects can have that I can''t stuff them into a static structure, having a db-oriented save system like this would be worthwhile.
To be quite honest, I won''t be using straight XML, but rather a bastardization of it.

My main interest in this area is the ability to store data in a platform-independent ascii format that is easily accessible.

As for writing out data, I''m not so sure that''s going to be a problem anymore, after I''ve done some base prediction; the only files that are modified often are okay to be separated into many different files.



As for the whole database idea, that''s pretty good, but not quite what I''m looking for.
This is my signature. There are many like it, but this one is mine. My signature is my best friend. It is my life. I must master it as I must master my life. My signature, without me, is useless. Without my signature, I am useless.
Did you look into tinyxml? I used it for a while for human readable files. easy to edit.

[edited by - alchemar on March 17, 2003 9:41:42 PM]
Mithrandir, it's probably best that you don't bastardize XML; not because I am a standards zealot or anything, but because if you keep it standard, you give yourself some advantages, including being able to use existing libraries to read and write your files, use online XML validators to help test your I/O routines, view your data files in a browser (nicely formatted so a human can easily see what data is there), etc.

To add to what Alchemar already said, I work with (and submit patches for) TinyXML and although it doesn't implement all of the XML standard, it should be enough to get you going quickly, with an easy-to-use interface in C++. Loading an XML file takes 2 lines of code, and then you just use FindFirstElement() and NextSiblingElement() to iterate through your tags. So the loop might look like:
for (TiXmlElement* element = document.FirstChildElement("mytagname");      element;      element = element ->NextSiblingElement("mytagname")){    // Process the "mytagname" element here, add it    // to a list, or whatever}   


PS: Maybe look into XLinks and XPointers as they can address your need to link different files while still sticking with the standard.

[ MSVC Fixes | STL | SDL | Game AI | Sockets | C++ Faq Lite | Boost | Asking Questions | Organising code files | My stuff ]

[edited by - Kylotan on March 18, 2003 6:21:26 AM]

[edited by - Kylotan on March 18, 2003 6:21:47 AM]
quote:Original post by Mithrandir
My main concern here is
A) a human readable format, so editors can be made easily
B) an easily extensible format


I''m just curious, but whether you go with XML or create your own INI-style txt file (or any other option for that matter) you''re going to end up writing routines to read, write, edit, etc., correct?

If you do all that, the only thing you''re missing to have an editor is display, which you do|will likely have already.

I don''t know that XML is going to net you much more than the ability to let IE or any other XML-reading app display your file, and at that point, do you really need much of an editor?

On the other hand, if you''re going to be doing a lot of Editor <-> XML <-> Database Server, then XML might be handy I guess, if you want an easy path from one end to the other.

Anyway, I was just curious about that first point - you''re going to be writing all the code that would make up an editor just to use/import the data - whether you aggregate it _into_ an editor or not, you''ll still have written all the code to do what an editor would do.

This topic is closed to new replies.

Advertisement