[.net] XPATH Problem

Started by
13 comments, last by zangetsu 17 years, 10 months ago
I have written an rssreader class which has been working fine until I tried to read a particular rss feed. the following xpath query returns null:- XmlNode nodeChnl = xDoc2.DocumentElement.SelectSingleNode("channel"); when trying to read the following xml (I have only pasted the top part due to the length of the document):- <?xml version="1.0" encoding="ISO-8859-1" ?> - <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns="http://purl.org/rss/1.0/"> - <channel rdf:about="http://education.guardian.co.uk/0,,472384,00.html?gusrc=rss"> <title>Education Guardian</title> <link>http://education.guardian.co.uk/0,,472384,00.html?gusrc=rss</link> <description>The latest Education news from Guardian Unlimited.</description> I presume it it something to do with the channel element containing an attribute but how do I get around this problem?
Advertisement
SelectSingleNode will return Null when the xpath does not resolve to a valid node. So to get the title you would want to do something like this:

nodeTitle = xDoc2.DocumentElement.SelectSingleNode("\rdf\channel\title");
strTitle = nodeTitle.InnerText;

chances are you are going to want to select different nodes in your reader, so you'll want to create more detailed xpaths. There is a good tutorial on xpaths on w3schools.com
The problem being that the rdf part is unique to this particular feed (and other feeds may also have namespaces). I will not know what to put in the xpath. is there a way to find this prefix in code?
Since you are using the Xml.XmlDocument it has a member that is a collection of child nodes, and the DocuementElement. You can do something like this (forgive the psuedo vb.net syntax) To determine what sort of feed it is and react to that.
Select Case xDoc2.DocumentElement.LocalName.ToLower    Case "rdf"        LoadRdfFeed(xDoc2)    case "rss"        LoadRssFeed(xDoc2)    Case "atom"        LoadAtomFeed(xDoc2)End Select'Additionally, can look through the ChildNodes collection.For Each cNode in XDoc2.DocumentElement.ChildNodes    If cNode.LocalName.ToLower = "channel" then        LoadChannelNode(cNode)    end ifNext
Due to the last post I have realised that the problematic feed is rdf not rss2.0. This explains the fact that the rdf file has items on the same level as the channel instead of as siblings of the channel.

I am still having problems with the namespace though.

I now have the following code:-


*************************************************************************
xDoc2.Load(txtRead);

XmlNamespaceManager namesp = new XmlNamespaceManager(xDoc2.NameTable);
namesp.AddNamespace("rdf","http://www.w3.org/1999/02/22-rdf-syntax-ns#");

XmlNodeList node1 = xDoc2.GetElementsByTagName("channel");
XmlNode node2 = xDoc2.DocumentElement.SelectSingleNode("rdf:RDF",namesp);
XmlNode node3 = xDoc2.DocumentElement.SelectSingleNode("rdf:channel",namesp);
XmlNode node4 = xDoc2.DocumentElement.SelectSingleNode("channel",namesp);
XmlNode node5 = xDoc2.DocumentElement.SelectSingleNode("channel");
XmlNode node6 = xDoc2.DocumentElement.SelectSingleNode("RDF/channel");

**************************************************************************

the only node that is not null is node1.

What am I doing wrong?
GetElementsByTagName expects a plain string name.
SelectSingleNode expects a valid xpath. An xpath always starts with a /, and none of your xpaths start with the slash. Try something like this:

XmlNode node6 = xDoc2.DocumentElement.SelectSingleNode("/RDF/channel");

xpaths can be a hard concept to get your head around for the first time. I strongly recommend reading through the tutorial I linked earlier.

edit:
correction, not all paths start with /, but I make it a habit to do so since it is less confusing to me to use absolute paths, and you are less like to have problems than when dealing with relative paths.
If I write

XmlNode node6 = xDoc2.DocumentElement.SelectSingleNode("/RDF/channel");

it still does not work.


Also if I use the following statement with the rss feed that does work, node6 is null

XmlNode node6 = xDoc2.DocumentElement.SelectSingleNode("/channel");

but the following statement works fine.

XmlNode node6 = xDoc2.DocumentElement.SelectSingleNode("channel");

it seems that prefixing a "/" breaks everything.

Try SelectSingleNode("//channel") -- the double slash means "any node named ... recursively".

There's a decent XPath tutorial On W3Schools.com, which you can also use as a reference.
enum Bool { True, False, FileNotFound };
ok lets see if I can be a little more clear.
A xpath that starts with / is an absolute path so it starts with the root element.
In this case the root node/element is:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns="http://purl.org/rss/1.0/">

I think the local name of this elememt is "rdf:RDF" you should be able to confirm this by evaluating xDoc2.DocumentElement.LocalName

If this is the case then the first node in an absolute xpath would be "/rdf:RDF" knowing that then the xpath "/channel" would suggest that the root element is a channel node. we know this is not the case because the root node is the rdf:RDF node. So of course "/channel" would not work. However channel is a child of rdf:RDF. The reason why xDoc2.DocumentElement.SelectSingleNode("channel"); works is because this xpath "channel" is a relative path becauswe it does not have the leading /. Its relative to the DocumentElement (which is the same as the root node.) Theis xpath is evaluated as "\rdf:RDF\channel" which is valid.

these two statements should select the same node.
xDoc2.DocumentElement.SelectSingleNode("channel")
xDoc2.DocumentElement.SelectSingleNode("\rdf:RDF\channel")

The first selects the child node of DocumentElement named "channel". The second finds the rdf:RDF root node(same as DocumentElement), then finds the Channel node under that.

Just incase we aren't clear, DocumentElement is always the Root Element, and conceptually Element and Node mean the same thing.
Ok, I am partially there now.

xDoc2.DocumentElement.SelectSingleNode("/rdf:RDF") returns a node


xDoc2.DocumentElement.SelectSingleNode("/rdf:RDF/channel") returns null

This topic is closed to new replies.

Advertisement