Is software rasterisation processor-heavy?

Started by
9 comments, last by Krypt0n 10 years, 10 months ago

it depends what your rasterizer do, for occlusion culling with one core, you just read/write 2 or 4 bytes per pixel, that's not an issue, you have a lot of math to do to come to this point, and you know in the beginning of the loop which pixel you gonna touch, you actually can even predict the next line. inserting some prefetch instructions can hide most of the memory access.

occlusion culling is a special case of rasterization, it has special demands.

1. accuracy: needs to be high quality, having one leaking pixel from the background will invalid all the rasterization you've done to create occlusion.

2. x/y resolution: you cannot really assume some lower resolution buffer will be enough. assume you have 128pixel in x, while actually playing 2560x1600, this means 20 real pixel match one occlusionbuffer pixel. if you stand in an unlucky angle to a window or door opening, the whole world you see behind it can flicker.

3. depth resolution: unless you want to waste human resource to place and adjust custom geometry, you have to render with the same accuracy in depth as hardware does, usually you try not only to avoid polys, but also drawcalls, so you want to cull decals, tiny props (e.g. painted image on a wall) and therefor you need to rasterize accurately to not have flickering due to z-fighting.

4. needs to be solid (software wise). so avoid special cases, create automatic regression tests, profile every change. if it works 99% of the time and 1% not, it won't be used, it can have a massive impact on gameplay and visuals (e.g. choppy framerates, slowdowns, wrongly culled objects...).

(5. most important part of occlusion culling is the amount you cull, that's what you save through the whole pipeline. don't fool yourself with some wrong impressions that you need the fastest occlusion culler. if you cull 90% of the drawcalls, ending up with only 500, artist will fill that up again and you are at 5k dc again. you will again end up with 10ms time. doing it in 2ms, but just culling 70% instead of 90% will lower your overall framerate!).

One last word regarding the 1bit/pixel solution, I've implemented something like this ages ago (was on a pentium, using 32bit ints), the culling results are very inconsistent. depending on your view angle, you might be rendering half the room behind a wall, just because the sorting re-ordered the polys as you look 45degree on the wall now. it's faster the bigger your polys are, but at the same time less accurate, while you can become quite accurate if you use tons of tiny polys, but then you won't see such a big speed up compared to the usual way of rasterization.

This topic is closed to new replies.

Advertisement