Nice fast XML parsers with validation?

Started by
4 comments, last by Beosar 9 years, 2 months ago

I have this little cross platform UI framework I'm working on.

It uses XML as layout definitions.

Currently, I'm using TinyXML since that is what we have always used, it's simple and fast, and I didn't need any advanced XML features.

Getting to it's limits now though, so I'm looking for nice alternatives, hoping the community can give some tips! smile.png

Must have features:

- C or C++ API (prefer fairly modern C++)

- Schema validation through DTD or XSD (anyone has anything to say about which of the two to prefer?)

- Good error handling, so I can output nice and readable error messages when people mess up their XML

- License to use for commercial closed source projects

Preferable:

- Cross platform/Easy to port

- Fast

- DOM

Wild wish:

- parsing/compiling binary XML (for speed and space savings)

If its easy to port and fast, I might be able to use it for parsing the XML on device (iOS, Android, WhateverOS), if not, it could be ok if I just run it in a tool on the desktop dev computer.

A quick search have turned up two alternatives:

Xerces-C++ (Seems dead though? last update on homepage is 2010...)

libxml2 (from Gnome)

Anyone who has any tips for other alternatives, or something to say about either of the two above?

Advertisement

I have recently ventured the same road as you did on my hobby project as it required a data binding compiler and a runtime XML parser (basically something similar to JAXB in java).

I have found that the problem with data binding and xml parsing in C++ is that most libraries offer fast and efficient xml parsing, however as soon as it must include vlidation mechanisms and data binding mechanisms, the libraries get a lot bigger.

I chose Code Synthesis XSD which offers all of your requirements (compiler to create classes from XSD file, runtime XML validation and parsing ...) and is released under GPLv2 licence.

It has indirect 3rd party requirement to Xerces-C++ to compile it though.

As for speed I cannot answer yet as I have been only using it for a few days.

The nice think about it is, it is nicely integrate-able with VS, as it can recompile your data binding classes on each build and with different options (wchar length, c++11 compiler ....)

File size on my setup (VS2013, x86, MDd runtime): 3MB lib, 4MB dll for debug and 3MB lib and 2MB dll for release build.

To provide a small sample of usage (classes have already been generated by the xsd compiler by providing it with the XSD schema).


std::unique_ptr<bridge_configuration> bridgeConfiguration = bridge_configuration_(CONFIGURATION_FILE, xml_schema::flags::dont_validate); 
//To parse the XML.

 for (bridge_sockets::bridge_socket_iterator it(bridgeConfiguration->bridge_sockets().bridge_socket().begin());
             it != bridgeConfiguration->bridge_sockets().bridge_socket().end(); 
//This is a bit clunky, maybe I don't know how to use it yet....... :(
             ++it) {
            if (bridge_socket::name_optional("frontend") == it->type()) {
                frontEndSocketAddress = it->protocol()->data();
                frontEndSocketAddress.append(it->name()->data());
...

Do check their website though, they provide quite a nice description of the features. smile.png


- Schema validation through DTD or XSD (anyone has anything to say about which of the two to prefer?)

DTD is more or less obsolete, stick to XSD.

Why do you not use some kind of resource packaging tool/pipeline ? This way you could validate all the xml files during the packaging process (e.g use a simple java tool to validate the xml against the xsd, java has out-of-the-box xml support) and you would use some kind of compression (zip) to avoid the support of binary xml.


I chose Code Synthesis XSD

Thanks for the tip, it looks interesting. Probably a bit overkill for my needs right now though, but I definitely will remember it for if/when I need it.


Why do you not use some kind of resource packaging tool/pipeline ?

This might be the way to go, I was looking into C++ API:s since I like C++, and I wanted to build it into a bigger system, and also for the possible option of using it on device too.

But maybe the easiest is to just have a small xml verification tool in Java and pass any layout xml files through that before packaging it.

Also has the advantage of lots of documentation and examples...

Zip is probaby good enough compression, even though some binaryfication of the file first should make it even smaller, but that is not a critical need, just a wild wish :)

Thanks for the answers.

Just out of curiosity, anyone who have used either Xerces or libxml2 and have some opinion on how horrible/nice they are to use? :)

Also happy to receive any more xml validation tips if anyone care to share.

My main need right now is validation with good error messages that pinpoint the actual problem in malformed layout xml files.

I think that "validation" and either "nice" or "fast" are mutually exclusive.

Do you really need validation in that part, though? Since you are in control of generating the XML (UI layout editor), you should be able to ensure that the document is created in a well-formed and, well... valid (according to your DTD/XSD) way.

If you are a little paranoid (that is, you don't trust yourself writing a correct generator), you can use any not-so-pretty-not-so-fast parser inside the UI layout editor to read in the document immediately after writing it out.

If that passes, then your nice-and-fast non-validating parser that you use everywhere and for which you've already written tons of code (code that works and has been tested) will be mighty fine to read it (after all, the document is valid and that doesn't change, so there's no need to check that over and over again).

If it doesn't pass, you need to fix the layout editor until valid documents come out, but there's no point validating the known-to-be-broken document on the user end anyway.


I think that "validation" and either "nice" or "fast" are mutually exclusive.

That is probably true :)

Easy to set up and use is more important then fast, seems my thought to run it on device too isn't very practical.

TinyXML should be fine there.


Do you really need validation in that part, though?

For the time being, there will be no editor, they will be edited by programmers by hand.

It is brief and compact XML, designed to be easily human readable.

So some syntax checking in the tool pipeline would make life much easier when you forget some slash or quote somewhere or misspell a tag or attribute.

Here is a xml validator: http://www.w3schools.com/xml/xml_validator.asp

It's made by the W3C. (There is also a website validator made by W3C, maybe useful if you are about to make a website.)

This topic is closed to new replies.

Advertisement