Jump to content
  • Advertisement
Sign in to follow this  
thre3dee

XML parser question

This topic is 3755 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi all, I'm writing an XML parser for my framework and I'm doing it myself so that it both ties in to my memory management and also so its consitent with my framework. I'm using TinyXML as a guide to how I should resolve any XML ambiguities. The only one I have at the moment is if a text element has nothnig but whitespace, should it be included in the DOM? For example, a simple chunk of XML like the following:
<root>
     <element>
     </element>
     <element2>Hello</element2>
</root>
Should I have:
+ $root
   - "         "
   + element
      - "  "
   - "   "
   + element2
      - "Hello"
Or should the blank 'gaps' in elements be discarded? I have a feeling that ActionScript 2.0 XML parser left them in which was a pain in the ass.

Share this post


Link to post
Share on other sites
Advertisement
Quote:
Original post by thre3dee
The only one I have at the moment is if a text element has nothnig but whitespace, should it be included in the DOM?


Yes. Whitespace is a perfectly legitimate and useful piece of text. The fact that you're using XML to specify a DOM as opposed to a piece of marked up text isn't really relevant to the XML parser, so it can't assume that whitespace is insignificant. However, if you want to strip out empty text nodes after parsing, because you know it has no semantic meaning to the consumer of the data, go for it.

Share this post


Link to post
Share on other sites
However, it is kind of suspicious that, in your example, the whitespace happens to make the element opening and closing line up. Normally you'd see them on the same line, as in:

<element></element>

for a truly empty string.

If you decide to preserve the whitespace, you have to do it verbatim, with any included tabs and newlines.

Share this post


Link to post
Share on other sites
Discarding whitespace, or deciding that an element is in some sense "empty" and should be treated differently, is an obviously application-specific and element-specific decision: a parser should conservatively preserve whitespace in order to support any possible usage.

As a partial solution, you might want to support the xml:space attribute (§2.10, "White Space Handling", in the XML recommendation):

Quote:

The value "default" signals that applications' default white-space processing modes are acceptable for this element; the value "preserve" indicates the intent that applications preserve all the white space. This declared intent is considered to apply to all elements within the content of the element where it is specified, unless overridden with another instance of the xml:space attribute.


Collapsing or removing whitespace-only text nodes might be the default for your library, with "preserve" allowing for an override.
Note that you can often add "xml:space='preserve'" to the appropriate elements implicitly with a DTD or XML Schema, whitout altering and bloating the documents.

Share this post


Link to post
Share on other sites
Quote:
Original post by LorenzoGatti
Discarding whitespace, or deciding that an element is in some sense "empty" and should be treated differently, is an obviously application-specific and element-specific decision: a parser should conservatively preserve whitespace in order to support any possible usage.

As a partial solution, you might want to support the xml:space attribute (§2.10, "White Space Handling", in the XML recommendation):

Quote:

The value "default" signals that applications' default white-space processing modes are acceptable for this element; the value "preserve" indicates the intent that applications preserve all the white space. This declared intent is considered to apply to all elements within the content of the element where it is specified, unless overridden with another instance of the xml:space attribute.


Collapsing or removing whitespace-only text nodes might be the default for your library, with "preserve" allowing for an override.
Note that you can often add "xml:space='preserve'" to the appropriate elements implicitly with a DTD or XML Schema, whitout altering and bloating the documents.


Thanks. Yeah I should have a good look at the XML spec.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!