I would like to eventually perform some SSE2 optimizations, in particular on the rasterizer. This would be my first venture into such a beast, and would be a great learning experience. I have wanted to add some form of SIMD support to my vector/matrix operations for a while now anyways.
However, the software rendering project is going to be ramping down. I have a couple other projects that I have been kicking around in my head that are just about ready to start, and my free time is very limited right now. I should get some news in the next couple days regarding a couple of proposals that I submitted last month, so my free time schedule is going to be either completely empty or a little more open in the coming days.
I still haven't decided if I will attempt to write a software rendering article or not. It would be fun I'm sure, but I don't know if I can add enough material to make it worth while. We'll have to see how that goes - usually if I am going to write it, inspiration will strike me and I'll just start writing. We'll see how it all works out...
Somehow, I got the impression that you're re-calculating the pixel pointer for every pixel you draw. Don't!
Forget about filtering your textures, it's just too damn much of a performance hit.
Switch to affine texture mapping when the triangle has a constant z.
Map the texture in little affine chunks.