API for browser to do HTML->rendered pixels conversion?

Started by
7 comments, last by shurcool 10 years, 11 months ago
I want to see how one could create an offline version of http://dabblet.com/ in C++.

Additionally, I want to experiment with making some AI that generates html and learns from what it's doing by "seeing" the results. I need something to convert html code to rendered pixel colours.

Goal: Convert input to output
Input: HTML5 page+relevant files (in my program's memory)
Output: an X*Y array of pixel colour values (in my program's memory) of what the webpage "looks" like

One obvious approach is to use an off-the-shelf browser, say Google Chrome. My program would save some html file on a hard-drive, then launch Google Chrome that opens said file, wait a few seconds in hopes Chrome finishes the rendering, then take a snapshot of the screen and crops the relevant part of the screen with the webpage.

However, that's inefficient for a few reasons:

  1. the input goes from my program's memory -> hard-disk file. Then from hard-disk file -> browser application opens it. In theory, it might be faster if the webpage could be processed within memory (but maybe it's not a large bottleneck, is it?)

  2. have to "wait" some unknown time for Chrome to finish rendering. How would my program know when Chrome finishes? There is no API way as far as I know other than for a user to look at the screen and notice the "loading" indicator come to stop.

  3. have to capture the browser's rendering output. It goes from browser process memory -> render. Then I capture entire screen -> put back into my program's memory. Again, maybe this won't be a bottleneck, but nevertheless it feels inefficient.


So one way to avoid all those problems would be to take the Chromium source code, and integrate it into my testing app so that everything can be performed by 1 process in memory, avoiding all 3 of the above inefficiencies.

  • How hard would it be to do that?
  • Is there any existing work to make it easier to do this?
  • Is there some API I can use to directly tap into some brower's "rendering" functions (to do what I want directly)?

If you've seen Bret Victor's Inventing on Principle talk, do you know what approach he used to create his live javascript editor?

Edited to better reflect what I'm seeking suggestions for.
Advertisement
do you know what approach he used to create his live html editor?[/quote]

That is live JavaScript, running locally inside a browser using a very fast machine and very simple example.

Approach demonstrated there is not viable in real world.

There are implementations providing functionality like that, either focused on JavaScript, CSS or similar specialized techs, but they don't scale beyond toy-style applications.


For HTML5, especially interactive stuff you can basically forget it. Virtualized solutions do not have hardware acceleration, so interaction rates will be drastically limited.

For end-to-end testing, one simply sets up n images, one with each OS/Browser combo, then one simply loads the page in each of them and observes the result. There's various helper frameworks that can automated actions, such as keyboard or mouse input and track the state of DOM during execution.

For dabbling around, there's in-browser console and WebInspector/Firebug tools for Firefox/Chrome.

Selenium is a testing service/framework, there's others as well.

But in short, there is no out-of-box solution. Even Google's recent talks are putting more emphasis on manual testing, since automation simply doesn't go far enough. Browser tech was never built with such approach in mind, so everything is a hack.

Output: an X*Y array of pixel colour values (in my program's memory) of what the webpage "looks" like[/quote]

One problem with this is that browser output is not standardized. So per-pixel analysis hasn't proven itself as viable in practice, except for broad phase defect detection. Hardware may also impact the results, especially anything that touches hardware-accelerated surfaces, which is basically most of the page these days. There's also issues with various color spaces, browser screen sizes and just an endless stream of unexplainable inaccuracies.

Attempts were made by applying heuristics, but people vastly outperform those so manual testing is still the norm.

Most of automated testing in web space focuses on DOM manipulation and correctness of logic, the visual parts are manually validated.
Thanks Antheus. Sorry, perhaps using the word testing was misleading because I only meant that as an example.


do you know what approach he used to create his live html editor?

That is live JavaScript, running locally inside a browser using a very fast machine and very simple example.[/quote]
You're right, he was editing JavaScript there.

However, what I'd like is exactly that, except something that allows you to edit .html/.css/.png files and see the results live inside the C++ program. Basically, I want to see how one could create an offline version of http://dabblet.com/ in C++.

Additionally, I want to experiment with making some AI that generates html and learns from what it's doing by "seeing" the results. I need something to convert html code to rendered pixel colours.

Goal: Convert input to output
Input: HTML5 page+relevant files (in my program's memory)
Output: an X*Y array of pixel colour values (in my program's memory) of what the webpage "looks" like[/quote]
So that's what I want to do in my C++ project.

What are some recommendations on how I could get it done?

[size=2](I remember 10 years ago in the Visual Basic days, you had a way of adding an "Internet Explorer frame" object to your application, which would act as an in-app browser frame. You could adjust its parameters, like the page URL. If I could have something like that, except hidden, and just get the result of the rendering into memory, that'd be awesome.)

Imagine I want to do some automated testing of HTML5 pages...

Goal: Convert input to output
Input: HTML5 page+relevant files (in my program's memory)
Output: an X*Y array of pixel colour values (in my program's memory) of what the webpage "looks" like


You need more inputs than this. How a web page renders is dependent on locale, local environment (cookies, session, plugin availability, browser size, registry settings, security zones, etc), and likely other factors I'm not even aware of since webdev isn't my primary focus.


the input goes from my program's memory -> hard-disk file. Then from hard-disk file -> browser application opens it. In theory, it might be faster if the webpage could be processed within memory (but maybe it's not a large bottleneck, is it?)
[/quote]

Depends on how much you're looking to process. Most browser APIs have a mechanism to specify the html directly from a string.


have to "wait" some unknown time for Chrome to finish rendering. How would my program know when Chrome finishes?
[/quote]

It might never finish. Flash will animate, javascript can run/trigger indefinitely. An AJAX callback on document ready is okay, but requires you tweak the page which seems against your goals.


have to capture the browser's rendering output. It goes from browser process memory -> render. Then I capture entire screen -> put back into my program's memory. Again, maybe this won't be a bottleneck, but nevertheless it feels inefficient.
[/quote]

Most browsers (and browser APIs) allow you to run full-screen. It at least cuts down on clipping.


How hard would it be to do that?
[/quote]

Prolly pretty hard to do reliably enough to put into an automated framework and expect it to work.


Is there some API I can use to directly tap into some brower's "rendering" functions (to do what I want directly)?
[/quote]

Usually. IE has COM bindings and a .NET WinForm control. Firefox has Gecko. Unfortunately, the .NET control behaves subtly differently than IE does due to registry defaults and a decade worth of hacks that separated IE the product from its core rendering engine.


The biggest problem is that even if you got all of that stuff working, it's still a bad way to do automated testing. What is failure criteria for the test? Being off by a pixel? That will happen due to dozens of slight environmental changes; none of which denote a rendering error. Parsing the html successfully? Just write a parser for that (even though plenty of sites don't parse properly using strict rules, but render fine). In the end, humans are still the best judges of something 'looking correct'.

You need more inputs than this. How a web page renders is dependent on locale, local environment (cookies, session, plugin availability, browser size, registry settings, security zones, etc), and likely other factors I'm not even aware of since webdev isn't my primary focus.

I've revised my original post to better reflect what I want.

Given that, I don't care about specifying these details. Any sane defaults will do (although it'd be nice to choose from various templates, this is hardly a priority).


Depends on how much you're looking to process. Most browser APIs have a mechanism to specify the html directly from a string.
[/quote]
Do you mean to specify the url in the form of "about:<html>...</html> or something similar? Which browser APIs, specifically, are you referring to?


It might never finish. Flash will animate, javascript can run/trigger indefinitely. An AJAX callback on document ready is okay, but requires you tweak the page which seems against your goals.
[/quote]

True. I'm okay with getting a static snapshot if the page has dynamic elements. I just want the page to be fully loaded, obviously. Besides, I will probably be working with static webpages most of the time.


Most browsers (and browser APIs) allow you to run full-screen. It at least cuts down on clipping.
[/quote]
Yeah, but ideally I'd prefer to do this off-screen. I want to do memory(html source) ->memory (pixel values) calculations, so taking screen-captures is not ideal - it's a compromise I might have to make to get what I want.


Prolly pretty hard to do reliably enough to put into an automated framework and expect it to work.
[/quote]
Fair enough. But I don't want to use this for testing in a traditional sense of the word.


Usually. IE has COM bindings and a .NET WinForm control. Firefox has Gecko. Unfortunately, the .NET control behaves subtly differently than IE does due to registry defaults and a decade worth of hacks that separated IE the product from its core rendering engine.
[/quote]

Now that's very interesting! Thanks! Gecko, especially, looks like something right up my valley, and I'll look into it. I wonder, doesn't Chrome also have a layout/rendering engine that's somehow stand-alone?


The biggest problem is that even if you got all of that stuff working, it's still a bad way to do automated testing. What is failure criteria for the test? Being off by a pixel? That will happen due to dozens of slight environmental changes; none of which denote a rendering error. Parsing the html successfully? Just write a parser for that (even though plenty of sites don't parse properly using strict rules, but render fine). In the end, humans are still the best judges of something 'looking correct'.
[/quote]
Yep, you're absolutely right. But I don't wanna use it for testing per-se, it was my bad for giving a misleading example. Thanks a lot for your informative reply!

I wonder, doesn't Chrome also have a layout/rendering engine that's somehow stand-alone?


I believe it does, but I haven't looked into it personally.


But I don't wanna use it for testing per-se, it was my bad for giving a misleading example.


Then what are you using it for?

Then what are you using it for?


Please see the first two paragraphs of my (edited) original post.



[quote name='shurcool' timestamp='1333484434' post='4927990']
I wonder, doesn't Chrome also have a layout/rendering engine that's somehow stand-alone?


I believe it does, but I haven't looked into it personally.[/quote]
Ah, of course, it uses WebKit.

Is it a good idea for me to use WebKit or Gecko directly in my application? How easy is it to add it to a blank C++ project and compile? (Of course, this is something I can and am going to find out on my own, but if anyone has comments I'd be glad to hear them)
Oh. I'd look and see how FireBug works then.

For what it's worth, by now I've found https://github.com/ariya/phantomjs which is somewhat relevant to the task at hand (as is this).

Edit: Err, it told me it wouldn't let me post to an old topic. Being curious and seeing the reply box, I tried to anyway. Sorry, I didn't mean to bump and old thread since that's considered not polite (or so it used to be, I don't know if this has changed lately), but I wanted to avoid the following situation... http://xkcd.com/979/

This topic is closed to new replies.

Advertisement