Archived

This topic is now archived and is closed to further replies.

Parsing XML with Xerces

This topic is 5502 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello. I'm trying to change my track file format to use XML. This is track.xml:
    <?xml version="1.0" encoding="iso-8859-1"?>

<track>
  <name>Default Track</name>
  <made_by>Stefan Brockmann</made_by>
  
  <terrain>
    <size width="300" height="30"/>
    <heightmap file="hmap.tga"/>
	<texture file="tex.tga"/>
	<detail file="detail.tga"/>
  </terrain>
  
  <grid>
    <pos id="0" x="142" y="30" dir="160"/>
    <pos id="1" x="152" y="20" dir="160"/>
    <pos id="2" x="162" y="28" dir="170"/>
    <pos id="3" x="172" y="18" dir="170"/>
    <pos id="4" x="182" y="27" dir="180"/>
    <pos id="5" x="192" y="18" dir="180"/>
    <pos id="6" x="202" y="27" dir="180"/>
    <pos id="7" x="212" y="18" dir="180"/>
    <pos id="8" x="222" y="27" dir="180"/>
    <pos id="9" x="232" y="18" dir="180"/>
    <texture file="gridpos.tga"/>
  </grid>
  
  <checkpoints>
    <pos id="0" x1="105" y1="30" x2="140" y2="50"/>
    <pos id="1" x1="45" y1="270" x2="90" y2="265"/>
    <pos id="2" x1="248" y1="295" x2="248" y2="260"/>
    <texture file="checkpoint.tga"/>
  </checkpoints>
  
  <objects>
    <obj mesh="stone.mesh" x="110" y="50" dir="90" h="0"/>
    <obj mesh="stone.mesh" x="120" y="50" dir="140" h="0"/>
    <obj mesh="stone.mesh" x="130" y="50" dir="45" h="0"/>
  </objects>
  
  
</track>
  
The problem is that I don't know hopw to get all elements using Xerces. Here's one example what I have tried:
  
XMLPlatformUtils::Initialize();

      
DOMImplementationLS *impl = DOMImplementationRegistry::getDOMImplementation(X("XML 1.0 Traversal"));
DOMBuilder *parser = ((DOMImplementationLS*)impl)->createDOMBuilder(DOMImplementationLS::MODE_SYNCHRONOUS,0);
DOMDocument *doc = parser->parseURI(X("track.xml"));
	
DOMNodeList *l = doc->getElementsByTagName(X("pos"));
		
for(int i=0; i<l->getLength(); i++)
{
	DOMNode *node = l->item(i);
	cout << "NodeName: " << (char *)node->getNodeName() << endl;
	cout << "NodeType: " << node->getNodeType() << endl;
	node = node->getNextSibling();
}
		
parser->release();
	
XMLPlatformUtils::Terminate();
    
This is one what I tried, but I can't get anything out of it. And doc->getDocumentElement() return null, so what's the problem here? I get no exceptions or anything. [edited by - stefu on November 19, 2002 4:31:14 PM]

Share this post


Link to post
Share on other sites
//XML data is stored as a tree, were each node may contain 0-n sub-nodes.
//To parse this type of data you should use a recursive method that will
//process only one node and call the same method for each possible sub-nodes.
//
//This is a rough method skeleton for that:

void RecursiveParser(DOMNode* pNodeElement)
{
DOM_Node SubNode;

//Process this node
printf("Node name: %s", (char *)node->getNodeName());
printf("Node Type: %s", node->getNodeType());

//Get the first node
SubNode = pNodeElement.getFirstChild();

//For each sub nodes
while(SubNode != 0)
{
//Call the same method on this node
RecursiveParser(SubNode);

//Move to the next node
SubNode = pNodeElement.getNextSibling();
}
}

//To use this method just call it by passing the base node of your XML doc
//(This code was strip down for readability)

DOMParser* Parser = 0;
DOM_Node DomDoc;
DOM_Node FirstNode;

//Create a parser
Parser = new DOMParser;
Parser->parse("YourFileName.xml");

//Get the first child node
DomDoc = Parser->getDocument();
FirstNode = DomDoc.getFirstChild();

//Call the method
RecursiveParser(FirstNode);


//Hope this can help

"Imagination is more important than knowledge."
- Albert Einstein

Share this post


Link to post
Share on other sites
Thanks. Some progression:

I get segmentation fault in this line:

//printf("Node Type: %s\n", pNodeElement->getNodeType());


I commented it and it run, but all print out was:
Node name: t
Node name: #

I use XercesDOMParser.

What might be wrong? Is my XML-file ok? I run it throught DOMPrint sample application and it told:
Message: The XML or Text declaration must start at line/column 1/1
So I fizxed it and after that DOMPrint sample run through it.

Share this post


Link to post
Share on other sites
This is harder than I thought. Using printf caused egmentation fault, weird.

I changed part of above code to:

  
char* nodename = XMLString::transcode(pNodeElement->getNodeName());
cout << "NodeName=" << nodename << endl;
delete [] nodename;
cout << "NodeType=" << pNodeElement->getNodeType() << endl;


And now it prints out:
NodeName=track
NodeType=1
NodeName=#text
NodeType=3

That''s correct now. But that''s not all it should print.

?

Share this post


Link to post
Share on other sites
Now I added Error handler and...
Error: Line: 3, Col: 8, Message: Unknown element ''track''
Error: Line: 4, Col: 9, Message: Unknown element ''name''
Error: Line: 5, Col: 12, Message: Unknown element ''made_by''
Error: Line: 7, Col: 12, Message: Unknown element ''terrain''
Error: Line: 8, Col: 36, Message: Unknown element ''size''
Error: Line: 8, Col: 36, Message: Attribute ''{}width'' is not declared for element ''size''
Error: Line: 8, Col: 36, Message: Attribute ''{}height'' is not declared for element ''size''
Error: Line: 9, Col: 33, Message: Unknown element ''heightmap''
Error: Line: 9, Col: 33, Message: Attribute ''{}file'' is not declared for element ''heightmap''
Error: Line: 10, Col: 27, Message: Unknown element ''texture''
.
.
.

But those errors were remover by removing: parser->setValidationScheme(XercesDOMParser::Val_Always);

No there''s no errors, but it doesn''t work. All nodes it prints out are:
NodeType=9
NodeName=#document
NodeType=1
NodeName=track
NodeType=3
NodeName=#text


This really is eating me
Isn''t there really any tutorial available. Just simply how to get
tag named "aaa" and get attribute values.
All I need now is this and I''m just fighting to understand why xerces not working.
There are API Docs, sample code how to create parser and call parser->parse("") but why can''t they tell how to read parsed document???

Share this post


Link to post
Share on other sites
Hmm, you must have fun reading this monologue (if anyone is reading)

I found that everything works fine if I remove all spaces and line changes from the XML file.

This works:

  
<?xml version="1.0"?>
<test><inner id="ohoo">HELLO</inner></test>


These dont:

  
<?xml version="1.0"?>
<test> <inner id="ohoo">HELLO</inner> </test>

-------------
<?xml version="1.0"?>
<test>
<inner id="ohoo">HELLO</inner>
</test>


Hope someone can help me.

Share this post


Link to post
Share on other sites
I''m still a bit confused on exactly what you are trying to do. Do you just want to read the values of the elements? Here''s some code that I have in my engine XML wrapper:


  
// set up our parsing booleans

bool bDoNamespaces = false;
bool bDoSchema = false;
bool bDoFullChecking = false;
bool bCreateReferenceNodes = false;

// validation schemes

DOMParser::ValSchemes valSchemes = DOMParser::Val_Auto;

// try to initialize Xerces

try{
XMLPlatformUtils::Initialize();
}
catch( const XMLException &pXMLException )
{
// convert our message to a string

string sMsg = "magXMLConfig XML Exception: ";
sMsg += (char*)pXMLException.getMessage();
// throw our message

throw sMsg;
}

// create a new IDOMParser

m_pParser = new DOMParser();

// set up our parser

m_pParser->setValidationScheme( valSchemes );
m_pParser->setDoNamespaces( bDoNamespaces );
m_pParser->setDoSchema( bDoSchema );
m_pParser->setValidationSchemaFullChecking( bDoFullChecking );
m_pParser->setCreateEntityReferenceNodes( bCreateReferenceNodes );
m_pParser->setToCreateXMLDeclTypeNode(true);

// parse our file

m_pParser->parse( m_sXMLFileName.c_str() );

// find our element

// ok get our DOM_NodeList

DOM_NodeList foundNodes = GetElementsByName( sElementName );

// now we need to pull the value off of the first element

DOM_Node selectedNode = foundNodes.item(0);

// check our node''s type, make sure its actually an element

if( selectedNode.getNodeType() == DOM_Node::ELEMENT_NODE )
{
// we have an element, get it''s first child ( the text )

DOM_Node nodeText = selectedNode.getFirstChild();

// now check it''s value

if( nodeText != NULL && nodeText.getNodeType() == DOM_Node::TEXT_NODE )
{
// we have our value now pull our value off

DOMString sValue = nodeText.getNodeValue();
sElementValue = sValue.transcode();

} else {
// we don''t have a text node?

string sMsg = "Text node not found.";
throw sMsg;
}
}


Hope that helps.

-----------------------------
kevin@mayday-anime.com
http://www.mayday-anime.com

Share this post


Link to post
Share on other sites
Heh, I can already get all nodes (like you showed above), but now the only problem is that XML-files can''t have whitespaces.

(Btw. what is DOMParser, it doesnät exist in current Xerces-C 2.1.0 api documentation. Are you using deprecated api?

I read Xerces mail archive and one said that there are two options:
- remove all whitespaces from XML-documents (not possible, because I write them by hands)
- use DTD to define whitespaces meaningless (DTD??)
So I don''t like another one of those ways.

Using Xerces is like running from problem to problem. I understand very well people who write own parsers. Simple parser can be done in much less time than learning to use for exapmle xerces. I think I too should have just wrote own parser. I''v already wasted too much time with this, they are just so damn badly documented things and in the end it doesn''t work like it should.

Share this post


Link to post
Share on other sites
DTD's (Document Type Definition) are files which define what tags and atributes, and what values they can have.

Check out here for more information, this site also has info. on parsing XML with DOM and SAX with Xerces.

DTD Design

Edit: just realised that those tutorials for parsing XMl are in Java, but I'd say that the interface would be the same though.

Henrym
My Site

[edited by - henrym on November 20, 2002 3:09:59 PM]

Share this post


Link to post
Share on other sites
Xerces is validating the XML integrity for you. But you have to tell Xerces how to validate your custom format. To do that you need a valid DTD.

I have wite spaces in my files and it still works.

This are the settings I''m using:


  
Parser = new DOMParser;
Parser->setValidationScheme(DOMParser::Val_Auto);
Parser->setDoNamespaces(false);
Parser->setDoSchema(false);


Then you should put you should do some error cheching. It will help you find any typos.


  
DOMErrorHandler* pErrHdl = 0;

try
{
m_pParser->parse("AFileName.xml");

if (m_pParser->getErrorCount())
{
ads_printf("Parsing Error: %s\n", pErrHdl->getErrorStr());
...
}
}
catch (const XMLException& e)
{
ads_printf("An XML Exception occurred during parsing\n Message: %s\n", DOMString(e.getMessage()));
...
}
catch (const DOM_DOMException& e)
{
ads_printf("A DOM Exception occurred during parsing\n DOMException code: %d\n", e.code);
...
}
catch (...)
{
ads_printf("An unknown error occurred during parsing\n");
...
}


"Imagination is more important than knowledge."
- Albert Einstein

Share this post


Link to post
Share on other sites
henrym, that DTD Design page was very good. I''ll see tomorrow what I do, but it looks like very good idea at least trying it.

Philippo, what is that DOMParser everyone is talking about? I don''t find it in Xerces API documentation here http://xml.apache.org/xerces-c/apiDocs/hierarchy.html

I''v been using XercesDOMParser instead and I tried your options, but it didnt help. Somehow if I have whitespaces in XML-file I just get almost nothing, or now It just throw thousands of (I had to Ctrl-C to stop) text entities that has nothing inside.

But thanks for now, I''ll try tomorrow using DTD.

Share this post


Link to post
Share on other sites
I''m not totaly sure but I think they renamed DOMParser for XercesDOMParser (for marketing reasons probably). It looks like it''s doing the exact same things DOMParser was doing.

I''ve been using an old version of the same library for a long time now. I think it''s time for me to upgrade to the new one.

"Imagination is more important than knowledge."
- Albert Einstein

Share this post


Link to post
Share on other sites
Yes, I saw DOMParser in deprecated code.

Before I go to sleep shall I ask if anyone could test this piece of code with both test.xml and test1.xml. They have same content, but other has whitespaces removed and I get different result (bot somehow weird).


xml_test.cpp:

    
#include <xercesc/parsers/XercesDOMParser.hpp>
#include <xercesc/dom/DOM.hpp>
#include <xercesc/sax/HandlerBase.hpp>
#include <xercesc/util/XMLString.hpp>
#include <xercesc/util/XMLUni.hpp>
#include <xercesc/util/PlatformUtils.hpp>
#include <xercesc/util/TransService.hpp>


#include <iostream>
using namespace std;

const char* xmlFile = "test.xml";


// ---------------------------------------------------------------------------

// This is a simple class that lets us do easy (though not terribly efficient)

// trancoding of char* data to XMLCh data.

// ---------------------------------------------------------------------------

class XStr
{
public :
// -----------------------------------------------------------------------

// Constructors and Destructor

// -----------------------------------------------------------------------

XStr(const char* const toTranscode)
{
// Call the private transcoding method

fUnicodeForm = XMLString::transcode(toTranscode);
}

~XStr()
{
delete [] fUnicodeForm;
}


// -----------------------------------------------------------------------

// Getter methods

// -----------------------------------------------------------------------

const XMLCh* unicodeForm() const
{
return fUnicodeForm;
}

private :
// -----------------------------------------------------------------------

// Private data members

//

// fUnicodeForm

// This is the Unicode XMLCh format of the string.

// -----------------------------------------------------------------------

XMLCh* fUnicodeForm;
};

#define X(str) XStr(str).unicodeForm()

void RecursiveParser(DOMNode* pNodeElement)
{
DOMNode *SubNode;

cout << "NodeType=" << pNodeElement->getNodeType() << endl;
char* nodename = XMLString::transcode(pNodeElement->getNodeName());
cout << "NodeName=" << nodename << endl;
delete [] nodename;

SubNode = pNodeElement->getFirstChild();
while(SubNode)
{
RecursiveParser(SubNode);
SubNode = pNodeElement->getNextSibling();
}
}


class MyErrorHandler : public ErrorHandler
{
public:
MyErrorHandler() {};
~MyErrorHandler() {};

void S(const SAXParseException &exception) {
char* message = XMLString::transcode(exception.getMessage());
cout << "Line: " << exception.getLineNumber() << ", "
<< "Col: " << exception.getColumnNumber() << ", "
<< "Message: " << message << endl;
delete [] message;
}
void warning (const SAXParseException &exception) {
cout << "Warning: ";
S(exception);
}
void error (const SAXParseException &exception) {
cout << "Error: ";
S(exception);
}
void fatalError (const SAXParseException &exception) {
cout << "Fatal: ";
S(exception);
}
void resetErrors () {
}
};


int main()
{

try {
XMLPlatformUtils::Initialize();
}
catch (const XMLException& toCatch) {
char* message = XMLString::transcode(toCatch.getMessage());
cout << "Error during initialization! :\n"
<< message << "\n";
delete [] message;
return 1;
}

//Create a parser

XercesDOMParser* parser = new XercesDOMParser();
parser->setValidationScheme(XercesDOMParser::Val_Always);
parser->setDoNamespaces(true);

ErrorHandler* errHandler = (ErrorHandler*) new MyErrorHandler(); //HandlerBase();

parser->setErrorHandler(errHandler);


try {
parser->parse(xmlFile);
}
catch (const XMLException& toCatch) {
char* message = XMLString::transcode(toCatch.getMessage());
cout << "Exception message is: \n"
<< message << "\n";
delete [] message;
return 1;
}
catch (const DOMException& toCatch) {
char* message = XMLString::transcode(toCatch.msg);
cout << "Exception message is: \n"
<< message << "\n";
delete [] message;
return 1;
}
catch (...) {
cout << "Unexpected Exception \n" ;
return 1;
}



//Get the first child node

DOMDocument *doc = parser->getDocument();
DOMNode *docRootNode = doc->getDocumentElement();

RecursiveParser(docRootNode); // doc or docRootNode?


delete parser;
delete errHandler;

XMLPlatformUtils::Terminate();

return 0;
}



Makefile

  
CC = g++
BIN = xml_test
OBJ = xml_test.o

LIBS = -L"/home/stefan/Ohjelmointi/STLport/lib" -L"/home/stefan/Ohjelmointi/lib" -L"/home/stefan/Ohjelmointi/xerces-c-src2_1_0/lib" -lxerces-c -lSTLport -lpthread

INCS = -I"/home/stefan/Ohjelmointi/STLport/stlport" -I"/home/stefan/Ohjelmointi/Gtreme/include" -I"/home/stefan/Ohjelmointi/xerces-c-src2_1_0/include" -I"/home/stefan/Ohjelmointi"

CFLAGS = $(INCS) -s


default: $(BIN)

all: clean $(BIN)

clean:
rm -f $(OBJ) $(BIN)

$(BIN): $(OBJ)
$(CC) $(OBJ) -o $(BIN) $(LIBS) $(CFLAGS)

xml_test.o: xml_test.cpp
$(CC) -c xml_test.cpp -o xml_test.o $(CFLAGS)


test.xml:

  
<?xml version="1.0"?><!DOCTYPE track [
<!ELEMENT track (name, terrain)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT terrain (width, height, heightmap, texture, detail)>
<!ELEMENT width (#PCDATA)>
<!ELEMENT height (#PCDATA)>
<!ELEMENT heightmap (#PCDATA)>
<!ELEMENT texture (#PCDATA)>
<!ELEMENT detail (#PCDATA)>
]><track><name>Default Track</name><terrain><width>300</width><height>30</height><heightmap>hmap.tga</heightmap><texture>tex.tga></texture><detail>detail.tga</detail></terrain></track>



test1.xml:

  
<?xml version="1.0"?><!DOCTYPE track [
<!ELEMENT track (name, terrain)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT terrain (width, height, heightmap, texture, detail)>
<!ELEMENT width (#PCDATA)>
<!ELEMENT height (#PCDATA)>
<!ELEMENT heightmap (#PCDATA)>
<!ELEMENT texture (#PCDATA)>
<!ELEMENT detail (#PCDATA)>
]>
<track>
<name>Default Track</name>
<terrain>
<width>300</width>
<height>30</height>
<heightmap>hmap.tga</heightmap>
<texture>tex.tga></texture>
<detail>detail.tga</detail>
</terrain>
</track>



[edited by - stefu on November 20, 2002 5:35:27 PM]

Share this post


Link to post
Share on other sites
By default Xerces adds text nodes where your whitespace is. There should (is) a setting to skip ignoreable whitespace.



50% of people are below average.

Share this post


Link to post
Share on other sites
Little bug in your RecursiveParser function. Change

SubNode = pNodeElement->getNextSibling();

to

SubNode = SubNode->getNextSibling();

it will give funny results (as you;ve seen)



50% of people are below average.

Share this post


Link to post
Share on other sites
The line

SubNode = pNodeElement->getFirstChild();

should fail if pNodeElement is not an Element node. Try bracketing the recursive part of the method with an if check on the node type. Ie,

if( pNodeElement->getNodeType() == DOMNode::ELEMENT_NODE )
{
// recursive code
}

Share this post


Link to post
Share on other sites

Hehehe, I''m so shamed
I should hav just go through trhe code rather than trying to find reason elsewehere

And with parser->setIncludeIgnorableWhitespace(false) it seems now to work perfectly.

Thanks!

Share this post


Link to post
Share on other sites