• 15
• 15
• 11
• 9
• 10

# Find Elements of a HTML List using TinyXML and C++

This topic is 2867 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I just added TinyXml to my Project, but I am not very familiar with XML and parsing. I am downloading the website source to a std string (done). The website is basically one large html list (http://www.sourcemod.net/smdrop/1.3) and I need the last two elements from the list. All of the examples and tutorials I have found for TinyXml are based around reading XML out of a file, or writing to one. I am really at a loss here as to how I can find the last two elements of the list (which are webpage links). Any help is appreciated!

##### Share on other sites
I'm not entirely sure which part you're having problems with, is it parsing the XML string?

The TiXmlDocument class has a method called Parse so you should be able to do something like:
TiXmlDocument doc;
doc.Parse(myString.c_str());

Note: You may need to remove
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
from the XML string before attempting to parse it, I'm not sure if it can handle that part.

Once you've parsed the string you can use the available methods (FirstChild, IterateChildren, etc) to traverse the XML-tree just as usual.

##### Share on other sites
Hey, Thanks for the response.

My problem is that I am extremely unfamiliar with this, and I am just not sure where to go after I parse. What exactly are the child elements? I am assuming all html tags such as li, a href, etc are child elements.

Qt has something like this QDomNodeList e = d.elementsByTagName("li");

So I am just wondering after I parse the string, what do I need to do to find the last two elements of the list? I see FirstChild(), LastChild(), etc. Just not really sure on how to use them to do this.

This is what the html source looks like...

http://ampaste.net/m54367016 (Sorry, tried to put it here but the code tags didnt work, keep formatting the HTML tags)

I basically need to get sourcemod-1.3.2-hg2947.zip and sourcemod-1.3.2-hg2947.tar.gz from that page. (These change everyday which is why I need to do this)

##### Share on other sites
Ok, let's see, you should be able to do something like this (I can't test this at the moment, so it may not work):

The tree path in of the XML we are interested in is:
<html>  <body>    <ul>     <li>       <a>         @href

TiXmlDocument doc;doc.Parse(myString.c_str());/* Traverse down the tree html->body->ul and then get the last <li> element under <ul>. */ TiXmlNode* pLastNode = doc.FirstChild().FirstChild("body").FirstChild("ul").LastChild("li");/* Now that we have the last one, we can get the previous sibling  which gives us the second to last one */TiXmlNode* pSecondToLastNode = pLastNode ->PreviousSibling();/* Now that we have the <li> elements we get the first child of each, which is the <a> element, and then we get the attribute "href" on that element */const char* lastUrl = pLastNode->FirstChid()->ToElement()->Attribute("href");const char* secondToLastUrl = pSecondToLastNode->FirstChid()->ToElement()->Attribute("href");

Something along those lines should work. I hope you manage to solve it!