Best XML library performance-wise?

Started by
9 comments, last by igni ferroque 18 years, 10 months ago
I will be doing a lot of work in our engine using C++ / XML and I'm looking for opinions on some libraries. I have been testing out Xerces and TinyXML but haven't got a chance to try others so far. I really have to watch memory usage as some of the XML files are huge so I'm not sure if DOM is even the best method but maybe something else. What would you use for parsing large XML documents in an environment where performance is needed? Just looking for opinions and techniques.
Advertisement
well....it doesnt matter what you use, whatever is easiest to you I guess. Being as you shouldnt be parsing them during run-time anyway...you should parse them before and store the informatino in a loadable, fast read format like binary...I use MSXML because its stupid easy and i have full control over what I do...in the end what you use to read it matters not...its whatever you know or can learn...
We have youth, how about a fountain of smart.e4 e5 f4 d5
I wrote my own XML parser for the heck of it, maybe you should give writing your own a try? :D
When parsing large documents, use an event-driven parser, not a DOM.

DOMs often use three times as much memory as the size of the file, sometimes much more. Use an event-driven parser like the SAX interfaces of Xerces, expat.

MSXML (for .NET) also implements an event-driven one, which is not SAX-like, which they claim is simpler.

In any case, using an event-driven parser will give your app constant memory usage independent of the file size.

Mark
I wrote my own as well, it parses the xml and stores it in a node tree, which provides functions for querying the tree.

Raymond Jacobs, Owner - Ethereal Darkness Interactive
www.EDIGames.com - EDIGamesCompany - @EDIGames

Quote:Original post by markr
When parsing large documents, use an event-driven parser, not a DOM.

DOMs often use three times as much memory as the size of the file, sometimes much more. Use an event-driven parser like the SAX interfaces of Xerces, expat.

MSXML (for .NET) also implements an event-driven one, which is not SAX-like, which they claim is simpler.

In any case, using an event-driven parser will give your app constant memory usage independent of the file size.

Mark



The big issue between event driven parsers and object model parses, is that data order matters. For instance:

Take this example as a basis:

<object> <use_action>cool</use_action> <actions>  <action name="cool" etc="whatever"></action> </actions></object>



let us assume that the use_action tag specifies what action to use, and the actions and action tags define actions that can be used.

in this example we'll say that actions must be loaded before you can use one.

In an object model parser, you query the data, thus you can load actions before you set your action, that is to say, the order of the xml does not matter, only it's nesting.

In an event driven parser however, the use_action tag would be invoked before the actions tag, this would be a problem since no actions have yet been loaded.

While this isn't much of an issue, it is nice not to have to remember what logical order your tags must be in, if they are dependant on one another.

Just thought I would bring that up =D

Raymond Jacobs, Owner - Ethereal Darkness Interactive
www.EDIGames.com - EDIGamesCompany - @EDIGames

Quote:I will be doing a lot of work in our engine using C++ / XML and I'm looking for opinions on some libraries.
What kind of work?
Free Mac Mini (I know, I'm a tool)
Feel free to provide more details to get a better answer.

Personally I've best experience with Microsoft's MSXML. For instance I've been involved in a site with really high load ~50kB xml posts (running under IIS). I used the SAX model in that case. SAX is quite boring to use, but gives the best performance. Unless you have complex documents I would consider SAX, as it gives optimal performance and memory load. The DOM model of MSXML is still a good choice, although not as high-performant as SAX. On the other hand, it's soo much easier to use.

Today I would for sure go for .Net XML parser, but maybe that's not an option for you? Performance won't beat MSXML SAX, but still quite good.

IMHO, writing your own XML parser indicates lack of engineering skills.
Don't write your own parser. There are so many XML libraries out there already.

I like to use TinyXML, but I'm not sure how it performs compared to other parsers.
MSXML is fairly fast, but then you gotta deal with COM, and COM is "teh suck" ;)
Especially if you're not used to it.

My 2 cents.
Quote:Original post by igni ferroque
What kind of work?

This will basically be for loading scenes, entities (which store the model file, material info, etc), etc. I will be loading them with XML but some scenes are so large that not everything is kept in memory as we have a concurrent thread loading entities as they come into proximity.

This is used in the editor, although the actual engine loads binary serialized xml and uses it that way.

The prototype was built up using DOM (which was not meant for our production solution but just a quick way to get it in) and I am looking to do this event driven.

Thanks very much for the suggestions, I will look into MSXML as I haven't looked at it yet, and it seems as if either that or Xerces will be the best solution.

Quote:
Today I would for sure go for .Net XML parser, but maybe that's not an option for you? Performance won't beat MSXML SAX, but still quite good.

Unfortunately not. This project is completely native code so I am unable to use the .NET framework libraries. If I were able to use managed code I would be using those in a second :)

Quote:Original post by JavaCoolDude
I wrote my own XML parser for the heck of it, maybe you should give writing your own a try? :D

I am not interested in building an XML parser at all quite honestly. I have had the best experience building projects using existing libraries for many of the lower level features.

This topic is closed to new replies.

Advertisement