threading the scanline rasterizer

Started by
7 comments, last by Tribad 9 years, 10 months ago

once time ago i was threading (paralelizing) some simple and ugly raytracer of my

- it was veryeasy as it was only reading the shared immutable scene data and wrote pixels to distinct rectangles on the screen - so no collision

When paralelizing the rasterizer situation looks different - as scene data is also immutable and no problem here screen output will be heavy overlaying (overlaping) *

so how to do that?

I was wondering if i should for example make each thread rasterize to its own set of frame and depth buffers that at last stage - "depth add" those buffers into one

* i wonder how efect can it bring if run without any locking or something - will it be some snow storm on the screen or only very rare pixel failures, or what?

does such coliding acces to framebufer only brings some visibility artifacts or it

does bring also some slowdown (does heavy shared acces to some ram array without locking make some slowddowns?)

Advertisement

if you have four threads for the rasterize assign each thread a single line.

The first thread takes line 0...4....8 and so one.

The second thread lines 1...5...9..

....

All parallel. No synchronization needs. No artifacts

if you have four threads for the rasterize assign each thread a single line.

The first thread takes line 0...4....8 and so one.

The second thread lines 1...5...9..

....

All parallel. No synchronization needs. No artifacts

Im afraid that this would be to short tasks for paralelizing im rasterizing 100k triangles or more at one frame (those triangles are mostly small ones (dont know 10x10 on average? ) so say - this would be say 10 miliseconds / 100k triangles is 100 nanosecond on triangle - can

such simple tasks be managed by pool of threads in parallel ? (im not sure but im afraid not at all

You could use a tiled renderer. Like the PowerVR series of GPUs.

http://en.wikipedia.org/wiki/Tiled_rendering

if you have four threads for the rasterize assign each thread a single line.

The first thread takes line 0...4....8 and so one.

The second thread lines 1...5...9..

....

All parallel. No synchronization needs. No artifacts

Im afraid that this would be to short tasks for paralelizing im rasterizing 100k triangles or more at one frame (those triangles are mostly small ones (dont know 10x10 on average? ) so say - this would be say 10 miliseconds / 100k triangles is 100 nanosecond on triangle - can

such simple tasks be managed by pool of threads in parallel ? (im not sure but im afraid not at all

This scheme makes no assumptions about the order of processing. But the synchronization must take place on writing if multiple threads write to the same location. If you make sure that the writing dont interfere everything works fine.

if you have four threads for the rasterize assign each thread a single line.

The first thread takes line 0...4....8 and so one.

The second thread lines 1...5...9..

....

All parallel. No synchronization needs. No artifacts

Im afraid that this would be to short tasks for paralelizing im rasterizing 100k triangles or more at one frame (those triangles are mostly small ones (dont know 10x10 on average? ) so say - this would be say 10 miliseconds / 100k triangles is 100 nanosecond on triangle - can

such simple tasks be managed by pool of threads in parallel ? (im not sure but im afraid not at all

This scheme makes no assumptions about the order of processing. But the synchronization must take place on writing if multiple threads write to the same location. If you make sure that the writing dont interfere everything works fine.

I dont really understand what method you mean - you mean running the whole pipeline twice on each core (as here on one) but when it comes to rasterizing skip the odd lines in 1st eben lines in 2nd?

that probably could work but maybe this is a bit of waste of first stages of the pipeline as it could be duplicated work

input array of traingles ->

- translation/rotation (model 2 light)

- shading

- translation/rotation (model 2 eye )

- 3d triangles out

- projection to 2d

- 2d triangles out

- rasterizaton

-> frame buffer

as far as i measured the last step it is drawing triangles takes

dont know but only roughly about 35% of it (not sure here)

not sure if it would be good count first 65% of pipeline in each core

- and paralelizing it would maybe need the storing the middle results,

that storing would slow down to, I must rethink it a bit - though it is some idea worth to consider

maybe yet some know some other scenarios?

You could use a tiled renderer. Like the PowerVR series of GPUs.

http://en.wikipedia.org/wiki/Tiled_rendering

I didnt understood it quite could maybe someone explain a bit?

doeas that mean that after wirst stage in rasterizer when i

produce 2d triangles data but before factual rasterization, store this

2d intermediate data into some grid-array of lists then in the second step rasterize it tile-list one after another? *

or it does something other? (never heard of this before, except the name is known a bit), tnx for info

*if so what if a triangle spans more than one tile?

wonder if that would speedup my prog... culd try though its a bit of messing

From what I understand (this could be wrong, someone correct me)

1. Transform triangles to screen space

For each tile:

2. Create triangle list. Triangles can span multiple tiles, so just insert them into each touched tile.

3. Sort triangles and draw front to back.

At this point if a triangle covers the whole tile and is opaque you can quickly abort the rendering process if you can prove that following triangles will be fully occluded.

For this you could maintain a min/max depth range for the whole tile or something similarly simple.

Furthermore, triangles that cover the whole tile should be fairly easy to rasterize as you only have to render a quad instead of a triangle.

Another aproach would be to make chunks of some amount of actions lets says 1000. Each stage has a list of such chunks as input and produces actions for the next stage. The workers take chunks of actions and produces actions for the next stage. If your processing goes on from stage to stage you synchronize the threads. Synchronization is only needed if you have strict dependencies between stages. But I would suggest to do so anyways.

This topic is closed to new replies.

Advertisement