Jump to content
  • Advertisement
Sign in to follow this  
blanky

HTML Parser

This topic is 4817 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Does anyone know a good HTML parser library for C/C++, if there even is one. Something like TinyXML, I'd like to do something as in counter strike when you join a server it has an HTML file. Thanks.

Share this post


Link to post
Share on other sites
Advertisement
Writing an HTML parser is quite easy in theory; in practice you have to take in account not well formed pages (that is pages full of errors like not closed tags or mismatched pairs). Incredibly modern browser can accept everything!!!

As you can imagine you can interpret a bad page in different ways.

On the other hand, a valid XHTML strict document (the right standard for web pages) is itself an XML document (so you could implement your XHTML parser via an XML parser)

If you are only searching for a specific section you can use regular expressions but I dont know good libraries in C++ ( perhaps boost:: ).

Share this post


Link to post
Share on other sites
Thanks, however, I've searched my butt off and I can't find out where to download either libraries. Can you please give me links? Thanks, I would really appreciate it. I'm interested in both gecko and CRegExp.

Share this post


Link to post
Share on other sites
HTML is recursive, so at best CRegExp could tokenize it. You still have to parse the tokens into a tree and deal with poorly formed documents, which leaves a LOT of work.

Share this post


Link to post
Share on other sites
Quote:
Original post by cpp forever
You can use CRegExp regular expressions library. In some cases it can help you.

Quote:
Original post by cpp forever
CRegExp is just good. Sources 25kB, plain C++.

HTML is not a regular language.

Share this post


Link to post
Share on other sites
Quote:
Original post by Dobbs
HTML is recursive, so at best CRegExp could tokenize it. You still have to parse the tokens into a tree and deal with poorly formed documents, which leaves a LOT of work.


Try boost::spirit. Closures take care of recursion. Very simple and elegant. If you read the docs and play with trivial examples first, it's quite easy to learn.

Share this post


Link to post
Share on other sites
Okay, thanks. However, I'm quite lost, I dont understand what this all is. I want a parser so that I could display an HTML page in a window, kind of like a browser but just show the page. It might be a control what I'm thinking about but I'm not sure. Also, the boost::spirit thing, is that an HTML parser? Where can I find gecko and CRegExp?

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!