Opimization ProgressThe optimization effort has been moving along much better lately. The results of my profiling efforts show what you would expect: anything that is done at the per-fragment level is expensive due to the huge number of times that it is done. In general, memory access at the fragment level is really hurting me right now, but I have started making progress.
The overall design of my pipeline was to have each major function have it's own 'processor' object, with abstracted memory buffers sitting between them. This works well, and is in theory a good idea for a multiprocessor system to paralellize the work.
However, all of the abstracted memory access calls between processors are really adding up. This made me change the access calls themselves to something a bit speedier by removing redundant range checking. This gave a significant speed up of about 10%! The next step that I took was to combine the rasterizer, depth tester, and pixel shader into one processor. This removed alot of the unnecesary memory access calls. This made a large increase in the overall speed as well. I'll make a more complete post on this later on, but I also plan on unifying all of the geometry side processors as well (vertex shader, back face culling, and polygon clipping). I don't expect as much of an increase right now, but when I start rendering more geometry it should keep these memory issues out of the way.
I still have several other things to try out, and have found a few tricks that I'll likely write about in future entries. Some of them are specific to my renderer, but others should be of use in general. Memory caches are your friends!