Multi-threaded frustum culling - how to gather and where to store visibility results?

Graphics and GPU Programming Programming

Started by Anfaenger March 26, 2017 05:57 PM

4 comments, last by Alundra 7 years ago

Anfaenger

830

Author

March 26, 2017 05:57 PM

Hi, I'd like to implement multi-threaded frustum culling in my graphics engine,

but I don't know how to best collect and 'merge' the results of visibility testing (i.e. an array of visible entities).

I'm rewriting the renderer for fast frustum culling (i.e. storing AABBs (in SoA format) and pointers to graphics entities in two contiguous arrays), but I feel that having each worker thread writing pairs of {object_type, object_pointer} into a single array would be inefficient because of locking.

What is a good (best?) way to gather visibility results from multiple threads and merge them into a single array for further processing?

(I'm ditching hierarchical (octree-based) frustum culling in favor of the DOD approach, because the former is 'branchy' and un-threadable (and also more messy).)

JoeJ

4,181

March 26, 2017 07:17 PM

I see two ways for frustum culling that are the most efficient:

* Brute force frustum culling on GPU feeding indirect draws.
* Hierarchical frustum AND occlusion culling on single threaded CPU.

What is better depends on the scene, probably coarse occlusion culling and fine grained frustum culling if you have indoor and outdoor scenes.

Occlusion culling is difficult and may require a low poly version of the scene, but even it's not suitable for multi threading it's very efficient.
E.g. i use a system that uses a software rasterizer to generate spans instead of pixels, so it's much more efficient than stupid GPU occlusion tests.
If the camera is inside a room the system detects quickly that neither further occlusion testing nor further rendering behind the walls is necessary.
If the camera is outdoor at the street, the system is expensive but still a win because it culls the interior of buildings except of what i can see through windows.

However, for current games a system like mine is probably overkill because houses don't have any interior, distant houses may be just a cheap low poly mesh, etc.

But the point is, just because it's un-threadable it's not necessarily a bad thing - there are always enough other threadable things to run at the same time.
Hierarichical methods are often un-threadable because they rely on early termination but they are perfectly work efficient for the same reason.
Don't fall into the trap to believe hardware is so fast nowadays that stupid brute force is better just because we have multi core and GPUs - that's a dead end.

but I feel that having each worker thread writing pairs of {object_type, object_pointer} into a single array would be inefficient because of locking.

Why don't you use one array per thread?
After the work is done do a prefix sum over the rusulting counts, and then you could do a multi threaded copy to fuse everything to one big array.

ErnieDingo

619

March 26, 2017 07:52 PM

I approached my culling in my quad tree a little differently. Remember the culling test is what costs, so at the start of the frame I run the culling test and traverse the entire tree, setting a flag at each node and leaf. I stop testing on a branch once a node has failed but i proceed down to the leaf to set the flag false. You could also just gather all the nodes in a list as you go and hold on to it till render time.

In my game i have multiple zones with a quad tree in each. So i can launch each in their own thread. You could easily do the same at your root node.

When it comes to rendering time. You can either traverse the tree or use a list as above. With a list you could break that into sections and submit on seperate threads. Job management ain't too bad ???

Indie game developer - Game WIP

Strafe (Working Title) - Currently in need of another developer and modeler/graphic artist (professional & amateur's artists welcome)

Insane Software Facebook

Hodgman

52,717

March 26, 2017 09:09 PM

What is a good (best?) way to gather visibility results from multiple threads and merge them into a single array for further processing?

You can always rewrite the consumer of this data so that it can work directly with N visibility lists, then there's no need to merge :D

Otherwise, make a merge task that has all the generation tasks as a dependency, and the consumer is dependent on the merge task... or, have the thread that finishes last (e.g. if ++numVisJobsCompleted == numVisJobsTotal, in atomic logic) do the merging.

. 22 Racing Series .

Digitalfragment

1,506

March 26, 2017 10:59 PM

What is a good (best?) way to gather visibility results from multiple threads and merge them into a single array for further processing?
You can always rewrite the consumer of this data so that it can work directly with N visibility lists, then there's no need to merge :D

This is it really - if you run a merge operation then you're going to be reading and writing memory that you don't otherwise need to. Unless your arrays are smaller than a cache line looping over multiple visibility lists isn't going to add any overhead; at least no where near the cost of the merge step would.

Alundra

2,325

March 26, 2017 11:26 PM

I like the way Hodgman suggest, it should be basically the way a task manager works with dependency to be flexible.

Multi-threaded frustum culling - how to gather and where to store visibility results?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Multi-threaded frustum culling - how to gather and where to store visibility results?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines