Performance UpdateIt's hard to believe, but I was able to almost triple the speed of my rasterizer by changing from an iterative algorithm to the gradient style algorithm - similar to what the Chris Hecker articles show. I am still using homogenous coordinates for my vertex transformation, and then use the homogenous coordinates (prior to the w-divide) to rasterize the triangle. This departs from what Hecker does, and seems to fit into the modern rendering paradigm a little bit better than his methods. This speed boost is also achieved with very little in the way of normal optimization as well, so there should be a good amount of room for improvement.
Just to use as a benchmark, I am testing my rasterizer speed by rendering a single triangle that covers approximately 1/3 of the framebuffer pixels and renders a bilinear sampled texture over the triangle. Before the change, I was getting about 20-25 FPS when clearing the z-buffer and backbuffer at the beginning of each frame on a 320x240 frame buffer (this is horribly slow, even for an old pentium M, but it should show the progress made with each optimization). After the change to the gradient algorithm the FPS went up to ~65. That is a big jump without even digging down to the assembly level.