Sign in to follow this  
coolblue

[.net] XPATH Problem

Recommended Posts

I have written an rssreader class which has been working fine until I tried to read a particular rss feed. the following xpath query returns null:- XmlNode nodeChnl = xDoc2.DocumentElement.SelectSingleNode("channel"); when trying to read the following xml (I have only pasted the top part due to the length of the document):- <?xml version="1.0" encoding="ISO-8859-1" ?> - <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns="http://purl.org/rss/1.0/"> - <channel rdf:about="http://education.guardian.co.uk/0,,472384,00.html?gusrc=rss"> <title>Education Guardian</title> <link>http://education.guardian.co.uk/0,,472384,00.html?gusrc=rss</link> <description>The latest Education news from Guardian Unlimited.</description> I presume it it something to do with the channel element containing an attribute but how do I get around this problem?

Share this post


Link to post
Share on other sites
SelectSingleNode will return Null when the xpath does not resolve to a valid node. So to get the title you would want to do something like this:

nodeTitle = xDoc2.DocumentElement.SelectSingleNode("\rdf\channel\title");
strTitle = nodeTitle.InnerText;

chances are you are going to want to select different nodes in your reader, so you'll want to create more detailed xpaths. There is a good tutorial on xpaths on w3schools.com

Share this post


Link to post
Share on other sites
The problem being that the rdf part is unique to this particular feed (and other feeds may also have namespaces). I will not know what to put in the xpath. is there a way to find this prefix in code?

Share this post


Link to post
Share on other sites
Since you are using the Xml.XmlDocument it has a member that is a collection of child nodes, and the DocuementElement. You can do something like this (forgive the psuedo vb.net syntax) To determine what sort of feed it is and react to that.

Select Case xDoc2.DocumentElement.LocalName.ToLower
Case "rdf"
LoadRdfFeed(xDoc2)
case "rss"
LoadRssFeed(xDoc2)
Case "atom"
LoadAtomFeed(xDoc2)
End Select


'Additionally, can look through the ChildNodes collection.
For Each cNode in XDoc2.DocumentElement.ChildNodes
If cNode.LocalName.ToLower = "channel" then
LoadChannelNode(cNode)
end if
Next

Share this post


Link to post
Share on other sites
Due to the last post I have realised that the problematic feed is rdf not rss2.0. This explains the fact that the rdf file has items on the same level as the channel instead of as siblings of the channel.

I am still having problems with the namespace though.

I now have the following code:-


*************************************************************************
xDoc2.Load(txtRead);

XmlNamespaceManager namesp = new XmlNamespaceManager(xDoc2.NameTable);
namesp.AddNamespace("rdf","http://www.w3.org/1999/02/22-rdf-syntax-ns#");

XmlNodeList node1 = xDoc2.GetElementsByTagName("channel");
XmlNode node2 = xDoc2.DocumentElement.SelectSingleNode("rdf:RDF",namesp);
XmlNode node3 = xDoc2.DocumentElement.SelectSingleNode("rdf:channel",namesp);
XmlNode node4 = xDoc2.DocumentElement.SelectSingleNode("channel",namesp);
XmlNode node5 = xDoc2.DocumentElement.SelectSingleNode("channel");
XmlNode node6 = xDoc2.DocumentElement.SelectSingleNode("RDF/channel");

**************************************************************************

the only node that is not null is node1.

What am I doing wrong?

Share this post


Link to post
Share on other sites
GetElementsByTagName expects a plain string name.
SelectSingleNode expects a valid xpath. An xpath always starts with a /, and none of your xpaths start with the slash. Try something like this:

XmlNode node6 = xDoc2.DocumentElement.SelectSingleNode("/RDF/channel");

xpaths can be a hard concept to get your head around for the first time. I strongly recommend reading through the tutorial I linked earlier.

edit:
correction, not all paths start with /, but I make it a habit to do so since it is less confusing to me to use absolute paths, and you are less like to have problems than when dealing with relative paths.

Share this post


Link to post
Share on other sites
If I write

XmlNode node6 = xDoc2.DocumentElement.SelectSingleNode("/RDF/channel");

it still does not work.


Also if I use the following statement with the rss feed that does work, node6 is null

XmlNode node6 = xDoc2.DocumentElement.SelectSingleNode("/channel");

but the following statement works fine.

XmlNode node6 = xDoc2.DocumentElement.SelectSingleNode("channel");

it seems that prefixing a "/" breaks everything.

Share this post


Link to post
Share on other sites
ok lets see if I can be a little more clear.
A xpath that starts with / is an absolute path so it starts with the root element.
In this case the root node/element is:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns="http://purl.org/rss/1.0/">

I think the local name of this elememt is "rdf:RDF" you should be able to confirm this by evaluating xDoc2.DocumentElement.LocalName

If this is the case then the first node in an absolute xpath would be "/rdf:RDF" knowing that then the xpath "/channel" would suggest that the root element is a channel node. we know this is not the case because the root node is the rdf:RDF node. So of course "/channel" would not work. However channel is a child of rdf:RDF. The reason why xDoc2.DocumentElement.SelectSingleNode("channel"); works is because this xpath "channel" is a relative path becauswe it does not have the leading /. Its relative to the DocumentElement (which is the same as the root node.) Theis xpath is evaluated as "\rdf:RDF\channel" which is valid.

these two statements should select the same node.
xDoc2.DocumentElement.SelectSingleNode("channel")
xDoc2.DocumentElement.SelectSingleNode("\rdf:RDF\channel")

The first selects the child node of DocumentElement named "channel". The second finds the rdf:RDF root node(same as DocumentElement), then finds the Channel node under that.

Just incase we aren't clear, DocumentElement is always the Root Element, and conceptually Element and Node mean the same thing.

Share this post


Link to post
Share on other sites
Ok, I am partially there now.

xDoc2.DocumentElement.SelectSingleNode("/rdf:RDF") returns a node


xDoc2.DocumentElement.SelectSingleNode("/rdf:RDF/channel") returns null

Share this post


Link to post
Share on other sites
"/rdf:RDF/channel" would select all channel nodes that are children of the rdf:RDF node. This would be multiple nodes. I guess that because the method is called SelectSingleNode, it can't return mutiple nodes. Try specifying which Channel Node like this:

"/rdf:RDF/channel[1]"

That should select the first node, but MS is weird. According to WC3 the xpath index should be 1 based, but in IE its is 0 based. I don't know off the top of my head if .Net is 0 or 1 based.

Share this post


Link to post
Share on other sites
"/rdf:RDF/channel[1]" or "/rdf:RDF/channel[0]" do not work either, besides their is only ever 1 channel element.

Why can't I just use "channel" like I can with rss2.0?

Share this post


Link to post
Share on other sites
I found out whats going on. The XmlNamespaceManager is being retarded. I found this page that explains it.

As I understand it, the namespace manager handles a default namespace, and an empty namespace in an unexpected way. This is interesting since MSDN says "The prefix to associate with the namespace being added. Use String.Empty to add a default namespace."

and the solution applied to your code:

*************************************************************************
xDoc2.Load(txtRead);

XmlNamespaceManager namesp = new XmlNamespaceManager(xDoc2.NameTable);
namesp.AddNamespace("default", "http://purl.org/rss/1.0/");
namesp.AddNamespace("rdf", "http://www.w3.org/1999/02/22-rdf-syntax-ns#");
namesp.AddNamespace("dc", "http://purl.org/dc/elements/1.1/");

XmlNode node6 = xDoc2.SelectSingleNode("/rdf:RDF/default:channel");
*************************************************************************

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this