Sign in to follow this  
RobMaddison

Is software rasterisation processor-heavy?

Recommended Posts

RobMaddison    1151
As part of my theory for occlusion, I'm thinking about incorporating a software rasteriser which will just render simple objects (mostly cubes I guess) in plain colours into a simulated bitmap (basically just a 2d array of pixels/numbers).

I kind of get the theory (or can find implementations of it) but I was wondering if it can be a quick process or is it processor intensive? I'm thinking it'll just go into it's own thread.

The bit I'm concerned about is filling the shapes with colour, like in between the bresenham lines.

Share this post


Link to post
Share on other sites
AgentC    2352

It depends on the occlusion buffer dimensions.

 

On a 2GHz Core i7, I take less than 1 millisecond to rasterize 2000 triangles of terrain geometry into a 256 pixel wide occlusion buffer. Increasing the resolution to 1024 pixels also increases the time taken to 2.5 ms. My routine is depth only rasterization, and is not optimized to the max, C++ code only and no SIMD instructions.

 

I've found it hard to thread the occlusion rendering though, as the way I use it is a sequential operation: find out visible occluders, rasterize them, then use the result to check object AABB's for visibility.

 

Share this post


Link to post
Share on other sites
kalle_h    2464

In the occlusion rasterizer that I'm using, my frame-buffer is only 1 bit per pixel (drawn to yet, or not drawn to). I render all the triangles for occluders and occludees from front-to-back, with occluders filling pixels, and occludees testing if all their pixels are filled or not (if any of their pixels aren't filled, the occludee is visible).
Using SSE, you can fill 128 pixels at a time with this algorithm, so the actual rasterization is not at all a bottleneck. I can run at pretty much any resolution without much difference in speed. The real bottleneck for me is actually transforming all of the vertices from model space into screen space (i.e. the 'vertex shader').
 
I break the frame-buffer into 'tiles', each of them 128 pixels wide and <height> pixels tall. Each of these tiles can then be independently rasterized by a different thread.

How you are sorting triangles? How about intersections? Or do sort by using max z and raster whole triangle with that? Sound great technique but I want to hear more details.

Share this post


Link to post
Share on other sites
belfegor    2834

... and occludees testing if all their pixels are filled or not (if any of their pixels aren't filled, the occludee is visible).

 

If occluder triangles takes maximum and occludee minimum Z, then i could test just 4 pixels/corners of occludees axis aligned 2D rectangle?

Or even 1 might be enough, i think? biggrin.png I don't see how could this fail.
 
I am gonna try to implement this myself since Vadim S. demo is crashing for me at certain camera position/angles, also his code is hard to follow and unreadable so i cannot fix it. sad.png

Edited by belfegor

Share this post


Link to post
Share on other sites
Hodgman    51226

I am gonna try to implement this myself since Vadim S. demo is crashing for me at certain camera position/angles, also his code is hard to follow and unreadable so i cannot fix it

I think the problem is in the clipping algorithm, sometimes, when triangles intersect the near plane (an assert in there is the cause of the crash)... but yeah, his code is very hard to follow.

 

i could test just 4 pixels/corners of occludees axis aligned 2D rectangle?

I'm not sure that would work -- e.g if the occludee is just beyond a window, but is larger than the window. It's four corners will be occluded by the wall, but it's centre will be visible through the window.

 

If you're rasterizing a traditional depth buffer instead of one of these 1-bit occlusion buffers, then you can create a hierarchical Z-buffer from it, which does allow you to test any occluder with just 4 samples. This is a very old technique (1993?), but was used very recently in a splinter cell game.

Edited by Hodgman

Share this post


Link to post
Share on other sites
Jason Z    6434

I always found that the memory bandwidth was the biggest issue with software rasterization, which would link the performance to the frame buffer size.  However, using one bit per pixel like Hodgman would probably alleviate that issue nicely...

Share this post


Link to post
Share on other sites
Krypt0n    4721

it depends what your rasterizer do, for occlusion culling with one core, you just read/write 2 or 4 bytes per pixel, that's not an issue, you have a lot of math to do to come to this point, and you know in the beginning of the loop which pixel you gonna touch, you actually can even predict the next line. inserting some prefetch instructions can hide most of the memory access.

 

occlusion culling is a special case of rasterization, it has special demands.

1. accuracy: needs to be high quality, having one leaking pixel from the background will invalid all the rasterization you've done to create occlusion.

2. x/y resolution: you cannot really assume some lower resolution buffer will be enough. assume you have 128pixel in x, while actually playing 2560x1600, this means 20 real pixel match one occlusionbuffer pixel. if you stand in an unlucky angle to a window or door opening, the whole world you see behind it can flicker.

3. depth resolution: unless you want to waste human resource to place and adjust custom geometry, you have to render with the same accuracy in depth as hardware does, usually you try not only to avoid polys, but also drawcalls, so you want to cull decals, tiny props (e.g. painted image on a wall) and therefor you need to rasterize accurately to not have flickering due to z-fighting.

4. needs to be solid (software wise). so avoid special cases, create automatic regression tests, profile every change. if it works 99% of the time and 1% not, it won't be used, it can have a massive impact on gameplay and visuals (e.g. choppy framerates, slowdowns, wrongly culled objects...).

(5. most important part of occlusion culling is the amount you cull, that's what you save through the whole pipeline. don't fool yourself with some wrong impressions that you need the fastest occlusion culler. if you cull 90% of the drawcalls, ending up with only 500, artist will fill that up again and you are at 5k dc again. you will again end up with 10ms time. doing it in 2ms, but just culling 70% instead of 90% will lower your overall framerate!).

 

 

One last word regarding the 1bit/pixel solution, I've implemented something like this ages ago (was on a pentium, using 32bit ints), the culling results are very inconsistent. depending on your view angle, you might be rendering half the room behind a wall, just because the sorting re-ordered the polys as you look 45degree on the wall now. it's faster the bigger your polys are, but at the same time less accurate, while you can become quite accurate if you use tons of tiny polys, but then you won't see such a big speed up compared to the usual way of rasterization.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this