Sign in to follow this  
fir

threading the scanline rasterizer

Recommended Posts

once time ago i was threading (paralelizing) some simple and ugly raytracer of my

- it was veryeasy as it was only reading the shared immutable scene data and wrote pixels to distinct rectangles on the screen - so no collision

 

When paralelizing the rasterizer situation looks different - as scene data is also immutable and no problem here screen output will be heavy overlaying (overlaping) *

 

so how to do that?

 

I was wondering if i should for example make each thread rasterize to its own set of frame and depth buffers that at last stage - "depth add" those buffers into one

 

* i wonder how efect can it bring if run without any locking or something - will it be some snow storm on the screen or only very rare pixel failures, or what?

does such coliding acces to framebufer only brings some visibility artifacts or it

does bring also some slowdown (does heavy shared acces to some ram array without locking make some slowddowns?)

 

Share this post


Link to post
Share on other sites

if you have four threads for the rasterize assign each thread a single line.

The first thread takes line 0...4....8 and so one.

The second thread lines 1...5...9..

 

....

 

All parallel. No synchronization needs. No artifacts

Share this post


Link to post
Share on other sites

if you have four threads for the rasterize assign each thread a single line.

The first thread takes line 0...4....8 and so one.

The second thread lines 1...5...9..

 

....

 

All parallel. No synchronization needs. No artifacts

 

Im afraid that this would be to short tasks for paralelizing im rasterizing 100k triangles or more at one frame (those triangles are mostly small ones (dont know 10x10 on average? ) so say - this would be say 10 miliseconds / 100k triangles is 100 nanosecond on triangle - can 

such simple tasks be managed by pool of threads in parallel ? (im not sure but im afraid not at all

Share this post


Link to post
Share on other sites

 

if you have four threads for the rasterize assign each thread a single line.

The first thread takes line 0...4....8 and so one.

The second thread lines 1...5...9..

 

....

 

All parallel. No synchronization needs. No artifacts

 

Im afraid that this would be to short tasks for paralelizing im rasterizing 100k triangles or more at one frame (those triangles are mostly small ones (dont know 10x10 on average? ) so say - this would be say 10 miliseconds / 100k triangles is 100 nanosecond on triangle - can 

such simple tasks be managed by pool of threads in parallel ? (im not sure but im afraid not at all

 

This scheme makes no assumptions about the order of processing. But the synchronization must take place on writing if multiple threads write to the same location. If you make sure that the writing dont interfere everything works fine.

Share this post


Link to post
Share on other sites

 

 

if you have four threads for the rasterize assign each thread a single line.

The first thread takes line 0...4....8 and so one.

The second thread lines 1...5...9..

 

....

 

All parallel. No synchronization needs. No artifacts

 

Im afraid that this would be to short tasks for paralelizing im rasterizing 100k triangles or more at one frame (those triangles are mostly small ones (dont know 10x10 on average? ) so say - this would be say 10 miliseconds / 100k triangles is 100 nanosecond on triangle - can 

such simple tasks be managed by pool of threads in parallel ? (im not sure but im afraid not at all

 

This scheme makes no assumptions about the order of processing. But the synchronization must take place on writing if multiple threads write to the same location. If you make sure that the writing dont interfere everything works fine.

 

I dont really understand what method you mean - you mean running the whole pipeline twice on each core (as here on one) but when it comes to rasterizing skip the odd lines in 1st eben lines in 2nd?

that probably could work but maybe this is a bit of waste of first stages of the pipeline as it could be duplicated work

 

input array of traingles ->

 - translation/rotation (model 2 light)

 - shading

 - translation/rotation (model 2 eye )

 - 3d triangles out 

 - projection to 2d

 - 2d triangles out

 - rasterizaton

-> frame buffer

 

as far as i measured the last step it is drawing triangles takes 

dont know but only roughly about 35% of it (not sure here)

 

not sure if it would be good count first 65% of pipeline in each core 

- and paralelizing it would maybe need the storing the middle results,

that storing would slow down to, I must rethink it a bit - though it is some idea worth to consider

 

maybe yet some know some other scenarios? 

Edited by fir

Share this post


Link to post
Share on other sites

You could use a tiled renderer. Like the PowerVR series of GPUs.

 

http://en.wikipedia.org/wiki/Tiled_rendering

 

I didnt understood it quite could maybe someone explain a bit?

doeas that mean that after wirst stage in rasterizer when i 

produce 2d triangles data but before factual rasterization, store this

2d intermediate data into some grid-array of lists then in the second step rasterize it tile-list one after another? *

 

or it does something other? (never heard of this before, except the name is known a bit), tnx for info

 

*if so what if a triangle spans more than one tile?

 

wonder if that would speedup my prog... culd try though its a bit of messing 

Edited by fir

Share this post


Link to post
Share on other sites

From what I understand (this could be wrong, someone correct me)

 

1. Transform triangles to screen space

 

For each tile:

 

2. Create triangle list. Triangles can span multiple tiles, so just insert them into each touched tile.

 

3. Sort triangles and draw front to back.

 

At this point if a triangle covers the whole tile and is opaque you can quickly abort the rendering process if you can prove that following triangles will be fully occluded.

For this you could maintain a min/max depth range for the whole tile or something similarly simple.

Furthermore, triangles that cover the whole tile should be fairly easy to rasterize as you only have to render a quad instead of a triangle.

Share this post


Link to post
Share on other sites

Another aproach would be to make chunks of some amount of actions lets says 1000. Each stage has a list of such chunks as input and produces actions for the next stage. The workers take chunks of actions and produces actions for the next stage. If your processing goes on from stage to stage you synchronize the threads. Synchronization is only needed if you have strict dependencies between stages. But I would suggest to do so anyways.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this