Sign in to follow this  

displaying html in c++

This topic is 3874 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

in working on a simple web browser for a school project, i already have the networking side programed out and can retreive the html file from websites; but its only being displayed as text. what is the most official way to display the html file graphically instead of as text? ive searched google but havent been able to find anything.

Share this post


Link to post
Share on other sites
That's an extremely challenging task. I assume you already know graphics programming?

Also, I hope you're not planning on being able to display actual internet pages. Even a "simple" web browser would take a big team many months to accomplish: javascript, CSS, nested tables, dynamic layers, embedded plugins, etc.

Simple HTML is basically just a 2D layout. But parsing the text into that layout can be incrediably difficult. You're going to need to recurse all the table structures and build out a 2D map of where items go (taking into account the dimensions of any images in the text, etc).

I'd imagine that if you haven't done graphics programming before that this will be a nearly impossible task. Even if you do have the experience, I'd imagine that it'd take you a few weeks of full-time work to get even a simple table structure rendering.

Probably the best thing to start out with is just a simple page with text and some images (no tables, or any other complex structure). After that get a simple table rendering. Then nested tables. Forget about anything having to do with CSS or things like that.

-me

Share this post


Link to post
Share on other sites
There is only one official way:
http://www.w3.org/TR/html401/

HTML 4.01 standard (anything above that involves much more, such as CSS, making rendering considerably more complex). Although implementing it is hell, not to mention huge task.

There's also problem with standards compliance, so most applications that need to display HTML embed a browser component.

In your case that might be somewhat self-defeating, since embedded components take care of connection as well.

Share this post


Link to post
Share on other sites
I assumed you were trying to code your own renderer. That's perhaps not what you're looking for.

If not then I'm sure there is an IE or firefox renderer that you could just plugin to a windows app (but then those probably come with all the networking stuff already implemented, thus defeating the purpose of your app...). Check out msdn.microsoft.com for info on IE, and google around for firefox info.

-me

Share this post


Link to post
Share on other sites
i know some basics in graphics programming, and i wasnt planning for javascript or anything like that. like you said all i want is to be able to display some text and a couple picutres.

so what your all saying is i have to manually sort through the html page and read the code and display it myself?

Share this post


Link to post
Share on other sites
Quote:
Original post by c_young
i know some basics in graphics programming, and i wasnt planning for javascript or anything like that. like you said all i want is to be able to display some text and a couple picutres.

so what your all saying is i have to manually sort through the html page and read the code and display it myself?


The times of HTML being text and pictures are long gone.

HTML 4 standard is considered old, and that one is complex. But once you get to CSS and advanced layout with DOM, things get really bad.

Incompatibilities of today's browsers don't come from developers being incompetent - the standard is just so huge, and majority of its features rarely get used, thus leading to incomplete or ambigous implementations of the standard.

Today's web pages rely on all these features, so implementing a simple version won't be able to render even the basic ones, hence, there are only a handful of commercial/professional implementations, and none of others come even close. Add to that that any useful browser must account for incompatible and non-compliant historically defines rules, and you get a huge mess.

Rendering HTML isn't trivial. Just look at standard. Each element has many rules, many special cases, some inconsistencies, and plenty of over-engineering.

The most you could hope to do quickly is probably HTML 2 standard, which doesn't include much layout. You'll need a way to render text, and calculate the image layouts. Then, most pages rely on tables for formatting to a degree. So you can't really skip those.

But unless you provide these, and just want to render text, then you're better off forgetting the whole HTML, and think about converting it into some rich format, rendering text only without any formatting, or using an embedded browser.

Yes, it really is that complex, if you want to produce any reasonable output that applies features of HTML. Otherwise you'd be just displaying text.

Share this post


Link to post
Share on other sites
so what should i do? i dont want to half do the project ( if converting it to a rich text format is the cheap way out ), id like to make it a quality program. but if displaying html is really too complicated were can you point me to?

Share this post


Link to post
Share on other sites
Might wanna check out K-meleon. Gives you a windows window with mozilla browser inside. TheOpenCD use that technologie. They made a c++ program that create a window and launch k-meleon in it thus making GUI become trivial. BTW it GNU/GPL with sources avaible on source forge somewhere : )

Share this post


Link to post
Share on other sites
Quote:
Original post by c_young
i think i will end up using that, but i hate borrowing code, i like knowing i made every line and i know how it all works


And yet I'm sure you're more than happy to use functions like printf or streams like std::cout or to call networking APIs. All of these things use code that wasn't written by you.

Particularly in the case of a well tested widely used piece of code like the gecko rendering engine there are FAR fewer bugs in that sort of code that in anything you write yourself.

Share this post


Link to post
Share on other sites
im not stupid, i know i didnt program cout or winapi. but i like knowing that if a real developer look at one of my programs theyd be like "this is really simple but it was done the right way"

Share this post


Link to post
Share on other sites
Quote:
Original post by c_young
im not stupid, i know i didnt program cout or winapi. but i like knowing that if a real developer look at one of my programs theyd be like "this is really simple but it was done the right way"
If you try to implement your own HTML renderer, a "Real developer" will think the exact opposite of that - they'll think "Why on earth did this person waste all that time and effort to reinvent the wheel, and have so many bugs in it?" Because no matter how hard you try, you WILL have bugs in your code, particularly for something as complicated as an HTML renderer. You're talking a good year of work for one person to implement a HTML renderer properly.

Share this post


Link to post
Share on other sites
Wow quite a challenging task.

Could you reduce it in scope and maybe do a text only browser? You're still going to have to decide how to treat all the random layout code though unless you just display stuff in the html file defined order.

Share this post


Link to post
Share on other sites
I've written an HTML 2.0 viewer once and it was actually quite doable. If I remember correctly, it only took me one week-end to get it to work. I made it so it would ignore all tags it didn't understand, and so you could actually browse the web with it to some degree, although most pages looked like they were made in the mid 1990s.
The problem is parsing HTML because unlike XML, it's not a very strict standard and most (older) HTML code on the web tends to be invalid. So you have to make your parser quite "tolerant" so it always gets the most out of the "tag soup" that it has to deal with. Once you've created a parse-tree from your HTML, rendering isn't all too difficult since HTML 2.0 doesn't support tables or positioning of any kind. You can basically just traverse your parse tree and render everything as you go.

Either way, good luck.

Share this post


Link to post
Share on other sites
Since this seems like such a big task, shouldn't someone already have written some platform independent code that takes care of all the hazzle and just let you plugin you own renderer code.
Something like:

class MyRenderer : public IHtmlRenderer
{
public:
void renderText(int x, int y, int width, int height, const FontInfo& font, const char* const text)
{
.. render text here ..
}
void renderImage(int x, int y, int width, int height, const ImageInfo& info, const ImageStyle& style)
{
.. render image here ..
}
// .. and so on ..
};

...
MyRenderer* renderer = ....;
HtmlPage page(renderer, input, ...); // Cross platform component
std::string htmlCode = ....;
page.render(r, htmlCode, 800, 600);


Or something alike... should be quite cool to implement a DirectX renderer or OGL...

Share this post


Link to post
Share on other sites
Quote:
Since this seems like such a big task, shouldn't someone already have written some platform independent code that takes care of all the hazzle and just let you plugin you own renderer code.


Yes, it's called Gecko.

I didn't check, but I'm willing to bet it supports custom renderer. Unfortunately, rendering HTML isn't even remotely enough - you need full interaction, reading for events, maintaining state, and so on.

Since Mozilla is cross-platform, it supports all that already.

HTML rendering isn't a small task, and anything that even slightly deviates from what both: Firefox and IE (4-7, all versions) expect, regardless of what standard says, is useless in real world. This includes full support for javascript, plugins, layouts, internationalization, asynchronous loading and more.

Today's web is interactive, and rendering HTML is the trivial part (yet it takes something the size of Gecko to acomplish it).

Once again: even remotely useful HTML rendering is a huge task, that a small number of people in the world has undertaken so far. The IE team, which, when in doubt, ignores standards altogether or writes their own, Mozilla, which is now on the 10th or 20th complete rewrite and Opera, which is a company that specializes in this, and lynx - a text web browser.

Now look at how big "web" is. And yet, there's a handful of people in the world that dare tackle this problem. This alone should tell you there's something weird going on.

A custom web browser is in many respects a geek's wet dream. But in this case, it's simply too big.

I think that Java doesn't even provide a fully HTML 2.0 compliant browser, and all other commercial libraries are serverly limited in functionality, most of them being non-interactive. All Java solutions simply delegate to system's browser. Even eclipse does that.

But rather than discussing it - start coding. This will be the best way to see for yourself how many problems you'll encounter when rendering even the simplest of web pages, and how many errors you'll get trying to render *anything* on web. Even the HTML 1.0 documents.

Quote:
Or something alike... should be quite cool to implement a DirectX renderer or OGL...


The char * is where everything will end. Text on web pages is not 8-bit ASCII, but one of international character set encodings. It also needs to be pre-parsed to handle escaped symbols.

But more importantly, HTML needs to be rendered progressively - loading whole page and displaying it isn't viable, and I think it isn't even possible anymore with modern XHTML formats, where page is dynamically built as it's processed.

Share this post


Link to post
Share on other sites
you might also check out uBrowser.
it's the Gecko renderer modified to output pixels into a format that you can use with OpenGL, or you can convert it to work with SDL, and probably even DirectX. As for interaction...it works...it tends to run a little slow, but that might be because my graphics card isn't that great and slows down the overall performance of the test application. Check it out though, you might like it.

Hope that helps,
-Wynter Woods(aka Zerotri)

Share this post


Link to post
Share on other sites

This topic is 3874 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this