differences from before:
* More performance tweaks / micro-optimization;
* Now the rasterizer supports multiple threads (2 threads were used in this test, with the screen divided in half);
* Inline ASM is used in a few places;
it is still sort of laggy, but I am not really sure how far software rasterization can be pushed on a generic desktop PC.
CPU: Phenom II X4 3.4 GHz;
RAM: 4x4GB PC3-1060
note: it is a fair bit faster at 1024x768 or lower...