Writing my own XML parser?

Started by
30 comments, last by SabreMan 19 years, 5 months ago
You should decide if your going to want SAX or DOM or both api's for parsing and writing. Both have their advantages but are orthogonal concepts, meaning you'd have to implement both of them independelty and this can double your implementation and testing time.


Unless you have a good reason, using an existing parser will save you deveopement time, giving you more time for implementing the "fun things"

Cheers
Chris
CheersChris
Advertisement
Quote:Original post by petewood
Quote:Original post by benutne
Anything else constructive to say?


I'm not sure you would like it.



Dont see why I wouldn't. I wasnt being an ass or anything. I open to suggestions.
I think I may end up using TinyXML. Who knows. I also thought about the SAX route. I know next to nothing about it though.
I'm using TinyXML, although I made some modifications to it. It's not xml-compliant anymore; I made it case-insensitive because I want my xml-ish files to be editable and easy to understand for anyone.

The real disadvantages to TinyXML are, AFAIK, that it's feature limited and parses the entire xml file before you can manipulate it. For little configuration files that's not a problem but for big files parsing could take a while and eat lots of memory.
Quote:Original post by KurioesFor little configuration files that's not a problem but for big files parsing could take a while and eat lots of memory.

Thats what I'm worried about. The min system specs here are pretty low. 64MB of RAM to be exact. Does anyone know anything about serializing a binary file as was suggested earlier or where to look to learn?
Quote:Original post by benutne
Does anyone know anything about serializing a binary file as was suggested earlier or where to look to learn?


well, it's really very simple to serialize data. simply it means just taking the object and writing out the data to some stream or other (i.e. a file stream / output stream). basically just sending off the contents of an object in such a way as it can be retrieved later.

serializing in a file then is just reading in the file and restoring the objects. so for instance you could organize your master data file like so:

int numObjects
object1
object2
etc...

then to read it in, you would read the numObjects, create an array of that size, or just set a counter variable that will control the termination of a loop. then loop through reading in one object sized chunk at a time from the file. if you are doing some kind of database program in which the size of the total data is larger than the amount of available memory, you can just reconstruct one object at a time -> parse it -> see if it's the one you need -> then if not, discard the object and read in the next object. that way you only ever have one object resident in memory at any time.

does this help at all, or have i totally misunderstood your question.

I'm not going to comment on the usefulnees of XML other than to say that it's original intent was to allow humans the ability to read/understand and edit data that can be directly useable by a machine. If you have no need for humans to manually edit the data directly I honestly don't see the use of XML. For me, it's always just been more efficient to creat a little app through which people can edit the data (which is saved off as a binary file). XML is sometimes a nice way to get up and running quickly (i.e. you can always replace the data with a binary format later so you can skip writing the data entry app in the short term). XML is just a bit of a performance hog, with binary data you go from data to object in a very direct process. with XML you have to parse the data first. text parsing is evil. in the end it just means slightly longer load times (depending on what data you are storing this can range from completely insignificant to a horrible app killer)

-me
Thanks Palidine. Thats EXACTLY what I was looking for. There will be an app that someone can enter information into, and since I know XML, I can manually construct the data for the first few objects to see how they work.

Thats pretty much the only reason I looked at XML. And yes, what you said did confuse the hell out of me, but dont let it worry you. A lot of the more abstract concepts do the first time I see them.

So the data file will have like a "header" to tell me how many objects it contains and how large each object is, and then the objects themselves. Right?
Basic approach (edit to suit your needs)...
struct mybinaryheader {    unsigned long headerSize; /* How many bytes is this header? */    unsigned long nodeFirst; /* Seek to here to find first node header. */};struct mybinarynodeheader {    unsigned long nodeSize; /* Number of bytes in this node */    unsigned long nodeNext; /* Seek to here to find next node header */};


Note the nice and regular format. Read 8 bytes from the file, and you have the header. This tells you how large the element is, and where to find the next one. This actual element can follow its header in the file. I usually follow that if "nodeNext" is zero, then we are looking at the last node in the file.

I use this format for structures that have to be somewhat forewards-compatible; no matter how large the data structure of each node/element is, they always have the same first 8 bytes, which are these header fields. No matter how large the element is, we can reliably find the next element.
RIP GameDev.net: launched 2 unusably-broken forum engines in as many years, and now has ceased operating as a forum at all, happy to remain naught but an advertising platform with an attached social media presense, headed by a staff who by their own admission have no idea what their userbase wants or expects.Here's to the good times; shame they exist in the past.
Quote:Original post by Wyrframe
Basic approach (edit to suit your needs)...
*** Source Snippet Removed ***

Note the nice and regular format. Read 8 bytes from the file, and you have the header. This tells you how large the element is, and where to find the next one. This actual element can follow its header in the file. I usually follow that if "nodeNext" is zero, then we are looking at the last node in the file.

I use this format for structures that have to be somewhat forewards-compatible; no matter how large the data structure of each node/element is, they always have the same first 8 bytes, which are these header fields. No matter how large the element is, we can reliably find the next element.



I think I see now. Still a little confusing, but I guess I just need to start coding something to get the hang of it.
Instead of having "nodenext" encapsulate the recursion, you could go the RIFF way, and store a type with each node. Then, nodes of type LIST have a "size" that's the sum of all their children, and the contents of the chunk is a sub-type plus the children. Other node types are not recursive, and thus they just contain their content. This is how WAV and AVI files are defined.

Anyway, when it comes to small, dinky, FAST and EASY XML parsers, I put mine on the web just the other day (touched up the README today): XMLSCAN.

You can use it in a forward-scanning way, or in a DOM-building way, and it's fast both ways. It's NOT a full XML parser; in fact, the goal was to make something that's simpler and smaller than TinyXML -- it works, it's fast, and it's simple, so mission accomplished. It took me one evening (!) to write and another evening to test, debug, and use for my particle system.
enum Bool { True, False, FileNotFound };

This topic is closed to new replies.

Advertisement