Sign in to follow this  

[C++] XML parser and std::wstring

Recommended Posts

Hi. I'm looking for some good XML parser written in C++ with which I could use std::wstring (UTF-16) without additional hassle, ie. I could use libxml2 but that would force me to convert returned data from UTF-8 to UTF-16 (with iconv), but I don't want to do that. So, any suggestions?

Share this post

Link to post
Share on other sites
It's not hard to convert between different UTF formats yourself. You can read about the format at, or just use their example code. I've been looking into Unicode just recently, and I would advise against using wstring everywhere because GCC's wchar_t is actually UCS-4 (32-bit). Great for compatibility, but a poor use of memory :)

Share this post

Link to post
Share on other sites
That conversion code is a bit ugly, but thanks anyway. =) Actually, memory is the thing that I need least, what I need is the stability ane ease of use. It seems that I have to write my own pseudo-XML parser. =)

Share this post

Link to post
Share on other sites

I wrote one recently using lex & yacc which neared completion. But as I have lost patience with lex & yacc (flex & bison actually) I have been writing my own parser and am in the process of rewriting my various parsers so that they don't use lex & yacc.

If you want to use mine when it has been revised then be my guest, it should only be a few days (because I'm wrangling with INTERNAL COMPILER ERRORs).


Share this post

Link to post
Share on other sites
Spirit would be good solution. Does anyone have got any 'spirit hello world'? I hate reading long documentations - learning from some example would be preferred. =)


Share this post

Link to post
Share on other sites
The Spirit documentation is actually very well documented and good reading. I find I get large compile times with boost::spirit, however.
Boost.Spirit user guide.

Here's a program that may parse very simple XML. It probably doesn't work, but you get the idea.

#include <boost/spirit.hpp>
#include <boost/spirit/grammar.hpp>
#include <iostream>
#include <fstream>
#include <iomanip>
#include <iterator>
#include <string>
#include <alorgithm>
using namespace boost::spirit;
using namespace boost;
using namespace std;

//In practice, you'd probably want to parse XML
//into an abstract syntax tree.

//Define a custom actor that prints out its arguments.
struct print_actor{

string name;

print_actor(string const& name):

template<class IteratorT>
void operator()(IteratorT begin, IteratorT end) const{
cout << this->name << string(begin, end) << endl;

print_actor print_a(string const& name){
return print_actor(name);

//Grammars are used to allow rules to operate on
//different types of scanners.
struct xml_grammar: public grammar<xml_grammar>{

template<class ScannerT>
struct definition{
typedef rule<ScannerT> rule_type;
definition(xml_grammar const& self){

element = opening_tag >> middle >> closing_tag;
opening_tag = ('<' >> tag_name >> !(attributes))[print_a("opening_tag")];
closing_tag = ("</" >> tag_name >> ">")[print_a("closing_tag")];
name = +alnum_p; //Probably not adhering to xml grammar here,
//but as I said, it's a simple parser
attributes = +attribute;
attribute = attribute_name[print_a("attribute")] >> "=" >> lexeme_d['\"' >> (*(!ch_p('\"')))[print_a("value")] >> '\"'];
//I think confix_p would work here too.

element, opening_tag, middle, closing_tag, name, &tag_name = name, &attribute_name = name, attributes, attribute;

rule_type const& start(){
return element;

int main(int argc, char** argv){

string s;
ifstream ifs;
istream* stream;

if(argc == 1)
stream = &cin;
stream = &ifs;

string line;
getline(*stream, line);
s += line + '\n';

xml_grammar g;
if(parse(s.begin(), s.end(), g, space_p).full){
cerr << "Yayz. It worked." << endl;
cerr << "Unlucky meight, your XML sucks." << endl;
return 1;


[Edited by - MrEvil on June 24, 2005 6:32:32 AM]

Share this post

Link to post
Share on other sites
Well, I think this might help. It's an XML parser / system I wrote awhile back. It does a little bit with a custom string object, thats easily replaced by std::string or whatever you want (STL wasn't availible for the project I wrote this for, so a little here and there will clear it up)

Even if you dont use it directly, the load and loadXML functions of DOMDocument might help you out. (It's a DOM-style parser, sort of.)

Anyways, hope this helps somehow!

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this