Rendering Thread Architecture

Started by
6 comments, last by Hodgman 6 years, 9 months ago

So I wanted to ask a question pertaining to rendering thread architecture. I currently know of two main designs.

One approach is where visibility , drawcall logic, and gpu submission are all done on the rendering thread and render related game data is sent to the rendering thread at the end of the frame. 

The second approach is where the sole responsibility of the rendering thread is to send gpu commands to the api. Engine abstacted Command Buffers will be generated in parallel or on the game thread, and sent to the rendering thread to be translated into gpu commands.

I know that Engines such as Unreal Engine use the first approach, while Unity / Source use the second approach , and Cryengine uses a hybrid of both. I wanted to see if anyone could give some advice, I know some of the benefits of both approaches , but wanted to get more incite on the architecture.

Advertisement

Use Vulkan/DX11 and let any CPU thread communicate with any GPU, maybe?

So on D3D9/10/11 and OpenGL, you need a dedicated rendering thread, because only one thread can talk to the GPU at a time. D3D11 is a weird exception because other threads can also call most API functions, even preparing command buffers, but the main rendering thread still needs to submit them to the GPU (and most of the per-draw cost happens at submission time, so this is not an optimization opportunity).

On D3D12/Vulkan you can start moving away from the idea of a dedicated rendering thread...

The first option that you mentioned is an early attempt to make good use of multi-core CPU's -- split the game into two parts, and run those parts over two threads. If the gameplay work and the rendering work are roughly equal in CPU usage, then this will halve  your frametime on a dual-core CPU. This requires that you very carefully segregate gameplay data and rendering data (to avoid threading bugs) and also likely requires that you double-buffer your gameplay state, so that for example, you can hand over a copy of every object's current position to the renderer, and then continue updating the next frame while the render thread draws stuff.

However, that approach does not scale to quad/hex/octo-core CPU's... So most modern engines are built around a "job system", where the engine makes a thread pool with one thread per core (2 on dual-core, 4 on quad-core...), and then you try to write all of your code as a collection of small jobs that get automatically distributed to the thread pool.
With this approach, your gameplay code can run on 4 cores, and then your rendering code (culling/etc) can run on 4 cores, and then final draw-call submission can run on one core.

You can still use the hard split between gameplay work and rendering work with this modern job system architecture though, but it's optional. You can either write your code so that gameplay and rendering structures are mixed, OR, you can pretend that you've got a "game thread" and a "render thread" like in the old model, even though you don't...

The downside of creating the hard split is that it takes effort. You have to be disciplined to segregate rendering structures from gameplay structures, and figure out how to create snapshots of gameplay data that's required by the renderer... The benefits are that your game code can be cleaner when responsibilities are nicely segregated into different sub-projects (it can also be messier if done badly!), and that if you decouple them fully, you can actually start processing multiple frames in parallel -- while the renderer is busy submitting draw-calls for frame #2, the rest of the thread-pool can be operating on frame #3.

My current architecture is one game thread , one rendering thread, and two worker threads. Both the rendering thread and the game thread can access the workers, i.e the rendering thread does visibility culling on up to three threads (including the rendering thread).  I have "scene proxy" objects that contain all data for rendering , and are synced at the end of the game frame. My issue with this architecture is that it doesn't fully utilize the strengths of DX12 and Vulkan. I would not need a dedicated thread , I guess to solve this, I could check the API that the user has and scale accordingly? I.e if the user is using DX11 or OpenGL I use the dedicated thread approach , and if the use is using vulkan , I will not have a dedicated rendering thread and just use all four cores for API command buffer generation. I've seen that some engines still use the dedicated rendering thread , but also use their job systems to create the command lists, and then call ExecuteCommandList on the dedicated thread, but that doesn't seem like an optimal approach.

I use a similar arch, but the gameplay / render threads are also part of the pool and will attempt to execute jobs when idle. This means that rendering work can run on all 4 threads.

The rendering thread can kick off draw-call submission to the job system if you've got a lot of draws. Yeah, I expose a caps variable describing whether threaded command buffers are supported (D3D11) and whether they're actually recording HW commands (D3D12/Vulkan) or not.

On D11, I use a deferred context to record my GUI commands on another thread, just because GUI traversal is expensive and it's intertwined with draw submission. 

For the main scene, I cull and collect drawables in a platform independent manner. Then on D11/GL, one thread records them to the immediate context.

On D12/Vulkan, they're broken into batches that are thrown into the job queue for recording to many command buffers. These jobs could submit those command buffers to the GPU, but that would result in non-deterministic draw ordering. To preserve ordering, one thread submits all of the command buffers in a single call after those jobs have completed. 

Ah , I see . Do you cull and collect drawables on the rendering thread ( i.e the rendering thread + worker threads ) ? Or do you cull / collect drawables on the game thread (game thread+ worker threads) and then send "drawitems" to the rendering thread to be translated into dx11 calls?

Yeah, I take a snapshot of the gameplay state and pass it from the game thread to the render thread, which includes object transforms, etc...

The render thread then culls and extracts draw-items. Generally there's many draw-items for a single model, which the game doesn't care about. It just wants to place a model in the world, not care that it's made up of 100 sub-meshes.

This topic is closed to new replies.

Advertisement