Sign in to follow this  
  • entries
  • comments
  • views

Scanline Rasterization

Sign in to follow this  


So, I'm now mostly done my rewritten scanline rasterizer. There isn't much more that I can really optimize that I can see. Right now it's fairly straightforward and simple; I don't do that many tricks to speed it up. Probably the biggest trick is the one-time calculation of edge data (e.g. dx/dy/dz and such) but even that is fairly basic. For the most part, it was just looking directly at the loops and saying "What's calculated again and again and again". Also, it's still for a single-frame scenario so I haven't done anything to avoid massive amount of clearing every frame, which I'm predicting to be a bit expensive. For that I'll probably add an extra bool to my struct that contains data for each pixel that will just say if the pixel was last drawn on an even or odd frame, unless someone knows some low (or high) level trick to set a continuous block of memory to some value really fast. Oh, and I have to do texcoord data, but I'll address that later in this post (just remembered another thing: hierarchical depth buffer and depth culling at the macro-pixel and triangle level are not in yet either).

Anyways, right now I think I'm getting a pretty good speed on it. (note: all times are of release builds) For my test scene I'm getting 4.25ms (235 fps) run time at 640x480, and 0.85ms (1176fps) run time at 320x240. Threre're a few observations I'd like to make about this performance. First, they're on a slow and old CPU (1GHz Athlon which I'm not entirely sure supports SSE) with undoubtedly slow memory, so on a higher end PC (e.g. my one back in Waterloo) these times would be much less. Secondly, the test scene is fairly simple. It's only 160 triangles, and while even I'll admit that this is a low number, it's not as low as I predict I'll need for a real scenario. For one, the lower resolution means that you can use a couple lower levels of geometric details and secondly, the lack of image fidelity required means that, as long as the texture coordinates are properly preserved, you could probably get away with yet another less level of geom detail. So, instead of rendering 10k poly characters for an upclose shot like you would on the GPU, you could more than easily get away with a <1k poly version of the character for when it dominates most of the screen.

As I mentioned earlier, I do not have texture coordinate stuff in yet either. For this I'm still not entirely sure how I want to go about doing this. My big decision right now is what flavour of mapping do I want to try out: affine texture coords, perspective texture coords, or perspective blocks of affine texture coords (see: Quake). Affine texture coords seems like a really good idea right now, in part due to the fact that, again, I don't need a high quality image. Besides, with affine coords, they're still all correct at the vertices, meaning there'd be only a handful of pixels who are a bit off. I'll likely try that out first and when I get the full algo working and I can see what textures/texture detail it's picking up I might decide to reevaluate it, but I'll be surprised if it doesn't produce adequate results.

The other big issue that I'm concerned with is how to mark what texture blocks are needed, that I've got a few ideas for. But first, to fill you in on what this part will need to do. Basically, while I'm drawing the image, I'll need to count up how many pixels are asking for what blocks of textures, and what detail is required at each block. The reason I need to count up the pixels is so that I can increment and decrement the count on the fly. If I just marked every block as I went, then areas of high occlusion will undoubtedly end up requesting a lot of extraneous texture data. For example, if I write a pixel and the pixel was already written to this frame, I'll figure out what block it'll need, and increment the request count of that block. So I've got a few ideas on how to do this:

1) Have a linked list of block requests that I add to and delete from dynamically. In this case, each block request has an identifer for what texture and texture level it's from, as well as the actual request count. Big disadvantage here is that a lot of news and deletes will have to occur which I know by now will kill performance. If I do this way, I may also try a series of linked lists for each texture that has this data so that I can keep requests grouped together properly.

2) One big array that has the maximum number of block requests possible allocated. So, if I have a main, uh, cache texture where the assembly of texture data is stored that is 2048x2048, and split that into 32x32 blocks, then I'll end up with a total of 64*64=4096 tota block requests. Big pro here is that no re/deallocation of memory will occur, however finding the block I need to increment may be difficult to do fast. Right now though, this is my favourite idea and will be the one I do first.

3) An array of block requests for each level of each texture. This method I'm going to avoid quite a bit because I don't think it will scale well to large textures, and having to scan each level of each texture for every request when evaluating what all of the data I'll need is does not sound very fast. I MAY try this if #2 does not work out as well as I would like.

So there's where I'm at right now. By next week I hope to have my software rasterizer all fully done and polished up, and to have either a fairly fleshed out design of idea #2 from above written up or a bit of an early implementation.
Sign in to follow this  


Recommended Comments

There are no comments to display.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now