XML parser question

Engines and Middleware Programming

Started by thre3dee June 12, 2008 08:05 PM

3 comments, last by thre3dee 15 years, 9 months ago

100

Author

June 12, 2008 08:05 PM

Hi all, I'm writing an XML parser for my framework and I'm doing it myself so that it both ties in to my memory management and also so its consitent with my framework. I'm using TinyXML as a guide to how I should resolve any XML ambiguities. The only one I have at the moment is if a text element has nothnig but whitespace, should it be included in the DOM? For example, a simple chunk of XML like the following:

<root>
     <element>
     </element>
     <element2>Hello</element2>
</root>

Should I have:

+ $root
   - "         "
   + element
      - "  "
   - "   "
   + element2
      - "Hello"

Or should the blank 'gaps' in elements be discarded? I have a feeling that ActionScript 2.0 XML parser left them in which was a pain in the ass.

Kylotan

10,510

June 13, 2008 05:09 AM

Quote:Original post by thre3dee
The only one I have at the moment is if a text element has nothnig but whitespace, should it be included in the DOM?

Yes. Whitespace is a perfectly legitimate and useful piece of text. The fact that you're using XML to specify a DOM as opposed to a piece of marked up text isn't really relevant to the XML parser, so it can't assume that whitespace is insignificant. However, if you want to strip out empty text nodes after parsing, because you know it has no semantic meaning to the consumer of the data, go for it.

OldGuy

122

July 01, 2008 11:39 PM

However, it is kind of suspicious that, in your example, the whitespace happens to make the element opening and closing line up. Normally you'd see them on the same line, as in:

<element></element>

for a truly empty string.

If you decide to preserve the whitespace, you have to do it verbatim, with any included tabs and newlines.

LorenzoGatti

4,648

July 06, 2008 09:32 AM

Discarding whitespace, or deciding that an element is in some sense "empty" and should be treated differently, is an obviously application-specific and element-specific decision: a parser should conservatively preserve whitespace in order to support any possible usage.

As a partial solution, you might want to support the xml:space attribute (§2.10, "White Space Handling", in the XML recommendation):

Quote:
The value "default" signals that applications' default white-space processing modes are acceptable for this element; the value "preserve" indicates the intent that applications preserve all the white space. This declared intent is considered to apply to all elements within the content of the element where it is specified, unless overridden with another instance of the xml:space attribute.

Collapsing or removing whitespace-only text nodes might be the default for your library, with "preserve" allowing for an override.
Note that you can often add "xml:space='preserve'" to the appropriate elements implicitly with a DTD or XML Schema, whitout altering and bloating the documents.

Omae Wa Mou Shindeiru

thre3dee

100

Author

July 11, 2008 02:01 AM

Quote:Original post by LorenzoGatti
Discarding whitespace, or deciding that an element is in some sense "empty" and should be treated differently, is an obviously application-specific and element-specific decision: a parser should conservatively preserve whitespace in order to support any possible usage.

As a partial solution, you might want to support the xml:space attribute (§2.10, "White Space Handling", in the XML recommendation):

Quote:
The value "default" signals that applications' default white-space processing modes are acceptable for this element; the value "preserve" indicates the intent that applications preserve all the white space. This declared intent is considered to apply to all elements within the content of the element where it is specified, unless overridden with another instance of the xml:space attribute.

Collapsing or removing whitespace-only text nodes might be the default for your library, with "preserve" allowing for an override.
Note that you can often add "xml:space='preserve'" to the appropriate elements implicitly with a DTD or XML Schema, whitout altering and bloating the documents.

Thanks. Yeah I should have a good look at the XML spec.

XML parser question

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

XML parser question

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines