Jump to content
  • Advertisement
Sign in to follow this  
RobMaddison

Is software rasterisation processor-heavy?

This topic is 1833 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

As part of my theory for occlusion, I'm thinking about incorporating a software rasteriser which will just render simple objects (mostly cubes I guess) in plain colours into a simulated bitmap (basically just a 2d array of pixels/numbers).

I kind of get the theory (or can find implementations of it) but I was wondering if it can be a quick process or is it processor intensive? I'm thinking it'll just go into it's own thread.

The bit I'm concerned about is filling the shapes with colour, like in between the bresenham lines.

Share this post


Link to post
Share on other sites
Advertisement

It depends on the occlusion buffer dimensions.

 

On a 2GHz Core i7, I take less than 1 millisecond to rasterize 2000 triangles of terrain geometry into a 256 pixel wide occlusion buffer. Increasing the resolution to 1024 pixels also increases the time taken to 2.5 ms. My routine is depth only rasterization, and is not optimized to the max, C++ code only and no SIMD instructions.

 

I've found it hard to thread the occlusion rendering though, as the way I use it is a sequential operation: find out visible occluders, rasterize them, then use the result to check object AABB's for visibility.

 

Share this post


Link to post
Share on other sites

In the occlusion rasterizer that I'm using, my frame-buffer is only 1 bit per pixel (drawn to yet, or not drawn to). I render all the triangles for occluders and occludees from front-to-back, with occluders filling pixels, and occludees testing if all their pixels are filled or not (if any of their pixels aren't filled, the occludee is visible).
Using SSE, you can fill 128 pixels at a time with this algorithm, so the actual rasterization is not at all a bottleneck. I can run at pretty much any resolution without much difference in speed. The real bottleneck for me is actually transforming all of the vertices from model space into screen space (i.e. the 'vertex shader').
 
I break the frame-buffer into 'tiles', each of them 128 pixels wide and <height> pixels tall. Each of these tiles can then be independently rasterized by a different thread.

How you are sorting triangles? How about intersections? Or do sort by using max z and raster whole triangle with that? Sound great technique but I want to hear more details.

Share this post


Link to post
Share on other sites

... and occludees testing if all their pixels are filled or not (if any of their pixels aren't filled, the occludee is visible).

 

If occluder triangles takes maximum and occludee minimum Z, then i could test just 4 pixels/corners of occludees axis aligned 2D rectangle?

Or even 1 might be enough, i think? biggrin.png I don't see how could this fail.
 
I am gonna try to implement this myself since Vadim S. demo is crashing for me at certain camera position/angles, also his code is hard to follow and unreadable so i cannot fix it. sad.png

Edited by belfegor

Share this post


Link to post
Share on other sites

I am gonna try to implement this myself since Vadim S. demo is crashing for me at certain camera position/angles, also his code is hard to follow and unreadable so i cannot fix it

I think the problem is in the clipping algorithm, sometimes, when triangles intersect the near plane (an assert in there is the cause of the crash)... but yeah, his code is very hard to follow.

 

i could test just 4 pixels/corners of occludees axis aligned 2D rectangle?

I'm not sure that would work -- e.g if the occludee is just beyond a window, but is larger than the window. It's four corners will be occluded by the wall, but it's centre will be visible through the window.

 

If you're rasterizing a traditional depth buffer instead of one of these 1-bit occlusion buffers, then you can create a hierarchical Z-buffer from it, which does allow you to test any occluder with just 4 samples. This is a very old technique (1993?), but was used very recently in a splinter cell game.

Edited by Hodgman

Share this post


Link to post
Share on other sites

I always found that the memory bandwidth was the biggest issue with software rasterization, which would link the performance to the frame buffer size.  However, using one bit per pixel like Hodgman would probably alleviate that issue nicely...

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!