do you know what approach he used to create his live html editor?
That is live JavaScript, running locally inside a browser using a very fast machine and very simple example.
Approach demonstrated there is not viable in real world.
There are implementations providing functionality like that, either focused on JavaScript, CSS or similar specialized techs, but they don't scale beyond toy-style applications.
For HTML5, especially interactive stuff you can basically forget it. Virtualized solutions do not have hardware acceleration, so interaction rates will be drastically limited.
For end-to-end testing, one simply sets up n images, one with each OS/Browser combo, then one simply loads the page in each of them and observes the result. There's various helper frameworks that can automated actions, such as keyboard or mouse input and track the state of DOM during execution.
For dabbling around, there's in-browser console and WebInspector/Firebug tools for Firefox/Chrome.
Selenium is a testing service/framework, there's others as well.
But in short, there is no out-of-box solution. Even Google's recent talks are putting more emphasis on manual testing, since automation simply doesn't go far enough. Browser tech was never built with such approach in mind, so everything is a hack.
Output: an X*Y array of pixel colour values (in my program's memory) of what the webpage "looks" like
One problem with this is that browser output is not standardized. So per-pixel analysis hasn't proven itself as viable in practice, except for broad phase defect detection. Hardware may also impact the results, especially anything that touches hardware-accelerated surfaces, which is basically most of the page these days. There's also issues with various color spaces, browser screen sizes and just an endless stream of unexplainable inaccuracies.
Attempts were made by applying heuristics, but people vastly outperform those so manual testing is still the norm.
Most of automated testing in web space focuses on DOM manipulation and correctness of logic, the visual parts are manually validated.