Parsing XML with Xerces

Started by
16 comments, last by stefu 21 years, 4 months ago
Xerces is validating the XML integrity for you. But you have to tell Xerces how to validate your custom format. To do that you need a valid DTD.

I have wite spaces in my files and it still works.

This are the settings I''m using:


      Parser = new DOMParser;    Parser->setValidationScheme(DOMParser::Val_Auto);    Parser->setDoNamespaces(false);    Parser->setDoSchema(false);  


Then you should put you should do some error cheching. It will help you find any typos.


      DOMErrorHandler*    pErrHdl = 0;    try    {        m_pParser->parse("AFileName.xml");        if (m_pParser->getErrorCount())        {            ads_printf("Parsing Error: %s\n", pErrHdl->getErrorStr());            ...        }    }    catch (const XMLException& e)    {        ads_printf("An XML Exception occurred during parsing\n   Message: %s\n", DOMString(e.getMessage()));        ...    }    catch (const DOM_DOMException& e)    {        ads_printf("A DOM Exception occurred during parsing\n   DOMException code: %d\n", e.code);        ...    }    catch (...)    {        ads_printf("An unknown error occurred during parsing\n");        ...    }  


"Imagination is more important than knowledge."
- Albert Einstein
"Imagination is more important than knowledge." - Albert Einstein
Advertisement
henrym, that DTD Design page was very good. I''ll see tomorrow what I do, but it looks like very good idea at least trying it.

Philippo, what is that DOMParser everyone is talking about? I don''t find it in Xerces API documentation here http://xml.apache.org/xerces-c/apiDocs/hierarchy.html

I''v been using XercesDOMParser instead and I tried your options, but it didnt help. Somehow if I have whitespaces in XML-file I just get almost nothing, or now It just throw thousands of (I had to Ctrl-C to stop) text entities that has nothing inside.

But thanks for now, I''ll try tomorrow using DTD.
I''m not totaly sure but I think they renamed DOMParser for XercesDOMParser (for marketing reasons probably). It looks like it''s doing the exact same things DOMParser was doing.

I''ve been using an old version of the same library for a long time now. I think it''s time for me to upgrade to the new one.

"Imagination is more important than knowledge."
- Albert Einstein
"Imagination is more important than knowledge." - Albert Einstein
Yes, I saw DOMParser in deprecated code.

Before I go to sleep shall I ask if anyone could test this piece of code with both test.xml and test1.xml. They have same content, but other has whitespaces removed and I get different result (bot somehow weird).


xml_test.cpp:

    #include <xercesc/parsers/XercesDOMParser.hpp>#include <xercesc/dom/DOM.hpp>#include <xercesc/sax/HandlerBase.hpp>#include <xercesc/util/XMLString.hpp>#include <xercesc/util/XMLUni.hpp>#include <xercesc/util/PlatformUtils.hpp>#include <xercesc/util/TransService.hpp>#include <iostream>using namespace std;const char* xmlFile = "test.xml";// ---------------------------------------------------------------------------//  This is a simple class that lets us do easy (though not terribly efficient)//  trancoding of char* data to XMLCh data.// ---------------------------------------------------------------------------class XStr{public :    // -----------------------------------------------------------------------    //  Constructors and Destructor    // -----------------------------------------------------------------------    XStr(const char* const toTranscode)    {        // Call the private transcoding method        fUnicodeForm = XMLString::transcode(toTranscode);    }    ~XStr()    {        delete [] fUnicodeForm;    }    // -----------------------------------------------------------------------    //  Getter methods    // -----------------------------------------------------------------------    const XMLCh* unicodeForm() const    {        return fUnicodeForm;    }private :    // -----------------------------------------------------------------------    //  Private data members    //    //  fUnicodeForm    //      This is the Unicode XMLCh format of the string.    // -----------------------------------------------------------------------    XMLCh*   fUnicodeForm;};#define X(str) XStr(str).unicodeForm()void RecursiveParser(DOMNode* pNodeElement){	DOMNode *SubNode;		cout << "NodeType=" << pNodeElement->getNodeType() << endl;	char* nodename = XMLString::transcode(pNodeElement->getNodeName());	cout << "NodeName=" << nodename << endl;	delete [] nodename;		SubNode = pNodeElement->getFirstChild();	while(SubNode)	{		RecursiveParser(SubNode);		SubNode = pNodeElement->getNextSibling();	}}class MyErrorHandler : public ErrorHandler{public:    MyErrorHandler() {};    ~MyErrorHandler() {};	void S(const SAXParseException &exception) {		char* message = XMLString::transcode(exception.getMessage());		cout << "Line: " << exception.getLineNumber() << ", "			 << "Col: " << exception.getColumnNumber() << ", "			 << "Message: " << message << endl;		delete [] message;	}	void warning (const SAXParseException &exception) {		cout << "Warning: ";		S(exception);	}	void error (const SAXParseException &exception) {		cout << "Error: ";		S(exception);	}	void fatalError (const SAXParseException &exception) {		cout << "Fatal: ";		S(exception);	}	void resetErrors () {	}};int main(){	   try {		XMLPlatformUtils::Initialize();	}	catch (const XMLException& toCatch) {		char* message = XMLString::transcode(toCatch.getMessage());		cout << "Error during initialization! :\n"			 << message << "\n";		delete [] message;		return 1;	}				//Create a parser	XercesDOMParser* parser = new XercesDOMParser();	parser->setValidationScheme(XercesDOMParser::Val_Always);	parser->setDoNamespaces(true);	ErrorHandler* errHandler = (ErrorHandler*) new MyErrorHandler(); //HandlerBase();	parser->setErrorHandler(errHandler);			try {		parser->parse(xmlFile);	}	catch (const XMLException& toCatch) {		char* message = XMLString::transcode(toCatch.getMessage());		cout << "Exception message is: \n"			 << message << "\n";		delete [] message;		return 1;	}	catch (const DOMException& toCatch) {		char* message = XMLString::transcode(toCatch.msg);		cout << "Exception message is: \n"			 << message << "\n";		delete [] message;		return 1;	}	catch (...) {		cout << "Unexpected Exception \n" ;		return 1;	}		   					//Get the first child node	DOMDocument *doc = parser->getDocument();	DOMNode *docRootNode = doc->getDocumentElement();		RecursiveParser(docRootNode); // doc or docRootNode?			delete parser;	delete errHandler;		XMLPlatformUtils::Terminate();		return 0;}  


Makefile

  CC   = g++BIN  = xml_testOBJ  = xml_test.o	   LIBS = -L"/home/stefan/Ohjelmointi/STLport/lib"        -L"/home/stefan/Ohjelmointi/lib" 	   -L"/home/stefan/Ohjelmointi/xerces-c-src2_1_0/lib" 	   -lxerces-c -lSTLport -lpthread	   INCS = -I"/home/stefan/Ohjelmointi/STLport/stlport"        -I"/home/stefan/Ohjelmointi/Gtreme/include" 	   -I"/home/stefan/Ohjelmointi/xerces-c-src2_1_0/include" 	   -I"/home/stefan/Ohjelmointi"	   CFLAGS = $(INCS)  -sdefault: $(BIN)all: clean $(BIN)clean:	rm -f $(OBJ) $(BIN)$(BIN): $(OBJ)	$(CC) $(OBJ) -o $(BIN) $(LIBS) $(CFLAGS)xml_test.o: xml_test.cpp	$(CC) -c xml_test.cpp -o xml_test.o $(CFLAGS)  

test.xml:

  <?xml version="1.0"?><!DOCTYPE track [	<!ELEMENT track (name, terrain)>	<!ELEMENT name (#PCDATA)>	<!ELEMENT terrain (width, height, heightmap, texture, detail)>	<!ELEMENT width (#PCDATA)>	<!ELEMENT height (#PCDATA)>	<!ELEMENT heightmap (#PCDATA)>	<!ELEMENT texture (#PCDATA)>	<!ELEMENT detail (#PCDATA)>]><track><name>Default Track</name><terrain><width>300</width><height>30</height><heightmap>hmap.tga</heightmap><texture>tex.tga></texture><detail>detail.tga</detail></terrain></track>    


test1.xml:

  <?xml version="1.0"?><!DOCTYPE track [	<!ELEMENT track (name, terrain)>	<!ELEMENT name (#PCDATA)>	<!ELEMENT terrain (width, height, heightmap, texture, detail)>	<!ELEMENT width (#PCDATA)>	<!ELEMENT height (#PCDATA)>	<!ELEMENT heightmap (#PCDATA)>	<!ELEMENT texture (#PCDATA)>	<!ELEMENT detail (#PCDATA)>]><track>	<name>Default Track</name>	<terrain>		<width>300</width>		<height>30</height>		<heightmap>hmap.tga</heightmap>		<texture>tex.tga></texture>		<detail>detail.tga</detail>	</terrain></track>    



[edited by - stefu on November 20, 2002 5:35:27 PM]
By default Xerces adds text nodes where your whitespace is. There should (is) a setting to skip ignoreable whitespace.


50% of people are below average.
Little bug in your RecursiveParser function. Change

SubNode = pNodeElement->getNextSibling();

to

SubNode = SubNode->getNextSibling();

it will give funny results (as you;ve seen)


50% of people are below average.
The line

SubNode = pNodeElement->getFirstChild();

should fail if pNodeElement is not an Element node. Try bracketing the recursive part of the method with an if check on the node type. Ie,

if( pNodeElement->getNodeType() == DOMNode::ELEMENT_NODE )
{
// recursive code
}


Hehehe, I''m so shamed
I should hav just go through trhe code rather than trying to find reason elsewehere

And with parser->setIncludeIgnorableWhitespace(false) it seems now to work perfectly.

Thanks!

This topic is closed to new replies.

Advertisement