The software rasterizing has gone well. I now have hierarchical block culling in. Basically how it works is that first for a 32x32 block it'll figure out if the block is entirely in the triangle, entirely out, or partially in. If it's entirely in, it'll simply fill the block without any comparisons being done. If it's entirely out, it'll simply skip all drawing for that block. If it's partially in, it moves on to do a check for the 16x16 block. This repeats for the 8x8 block, and 4x4 block, and then after that it just checks every pixel (even if I went down to 2x2, that's four checks for four pixels). Here's a visualization of this in action:
The colour indicates at what stage the pixel was filled in. The darker the colour, the lower the resolution of the block. It goes until near-black, which means that that pixel was scanned individually. For larger triangles it's great, but I'm actually more than a little disappointed when the triangles get a bit finer, as a lot of individual pixels get scanned:
I'm not that surprised though, this is something I anticipated. Depending on the speed of everything after all is said and done, I may decide to look into this a bit more later, after I see some realworld results regarding performance and such...
I've also done some other optimizations on the sides (e.g. for each block, I'd remultiply the X coord by one of the coefficients for the current triangle many times, so instead I just calculate it once). Even without doing some loose bvenchmarks, I actually noticed a significant speedup. Right now when I run the program, with it drawing about 180 triangles to a 640x480 memory image 500 times over, it takes about 6 or 7 seconds to execute. At first I was shocked, but then I turned on optimizations, and it runs in about 2 seconds, and that's on a 1GHz Athlon, and without and SSE/SSE2 instructions being used, or special processor code generation; that is, I have a blended CPU target. It currently only outputs an arbitrary value and nothing practical, but that seems pretty swell to me, especially since I still have to do more than a few microoptimizations in the draw code. Next up on my list, I'm going to throw depth buffer calculations back in, as well as hierarchical z-culling.
Now, procedural textures. I think everyone here has seen Ysaneya's fantastic terrain and planets, which are entirely proecdural. Also, who doesn't love applying Perlin noise to textures for various things like clouds? With my previous rants on memory virtualization, I've mentioned the fact that we almost have enough AGP bandwidth to send down almost a full screen's worth of simple texture data. That's great and all, but sadly we have barely a fraction of that when moving data from the hard drive to the system memory (or, alternatively, DVD/CD drive to system mem). This is a fairly significant issue, as the user might be moving too fast for the system to keep up, resulting in fairly blurry textures. A good solution is to do compression to shrink the data transfer: DXTn, PNG, JPEG (if you're totally ignorant of image quality), and so on. However, even those can be a bit impractical at some times: DXTn and JPEG are a bit lossy, and PNG doesn't allow for random data access. An alternative may be to do procedural textures. Most people are used to procedural textures and data to basically being a set of random values, and even the examples I introduced with revolve around the usage of some form of noise. While they do work pretty well, they're still special case scenarios, and cannot be influenced by artists very easily. Even Ysaneya admits that each planet in his engine may look a bit repetitive, which is easily credited to the fact that an artist can't put a unique touch on each one (although, things like the orbiting space station that he has right now would help). But what if semi procedural textures could be done, generating high frequency data from artist influenced values? For example, some of the hardcore Doom3 modders have been playing with the early-gen MegaTexture stuff for Quake Wars that came out in a recent patch, and I found one tidbit in which an artist could essentially paint roads onto the landscape. With something like vector or curve drawing in this case, a very small sample of data could be used to generate, in the case of MT, extremely high resolution terrain that is still heavily artist influenced (and could still use noise as a little bit of touchup or something).
So what other applications could semi-procedural textures be used in? Well, at QuakeCon 05, John Carmack proposed the idea of basically rendering a 500 page PDF file at 100 dpi, essentially hundreds of megabytes of data, even though such a PDF could be as small as 10's of megabytes in size. With the theoretically lean memory usage and movement between the CPU and GPU, why not just go and make a random-access PDF generator, and page onto the GPU only that data that's needed, when it's needed? Also, how about animation? Take a Flash movie like The Demented Cartoon Movie, which, depending on your point of view, could be treated as half an hour of 640x480 of 30fps lossless animation packed into less than 4MB of data, as opposed to 45GB of raw video. Another example could be, compositing of complex functions and/or art data. For something like a sidewalk texture, the gaps between blocks could be drawn on, with a little bit of noise paired with an if-else ladder for the cement and stones and small rocks randomly mixed in with the cement. One could easily imagine a lot of different scenarios where custom made semi-procedural textures are used for textures in the game world, as a means of providing special case image data compression with excellent compression ratios.