Multi thread rendering engine

Started by
7 comments, last by ErnieDingo 7 years, 7 months ago
I'm going to apply multi thread into my engine, don't know how many technique regarding
multi thread rendering engine. Only know one that the engine launch a thread to update scene graph.
Something intresting like building cross frame render data in frostbite.
[attachment=33172:360??20160904235041077.jpg]
Who can explain how it work?
Advertisement

Multithreading is something special in game engines (because most graphics programmer dont want to take a handshake on CPU synchronization techniks) and for example in Unreal Engine 3 is just a single threaded renderer but running in its own graphics thread. Real multithraded rendering can happen on the same graphics device but also on different devices (for example in DirectX).

The secrets to that are just simple in detail. You setup two rendering threads and a queue for example. You put a render task into queue and one of the threads picks it. Know you need to lock resources (vertex data, textures, matrices) and anything else you have in memory that could be changed by your main thread e.g. when loading a model or moving the camera. There are a bunch of different locks in the APIs like fences, events and locks in GL/Vulkan so you need to test and decide what works best for your purpose. Seond step is to push the result of rendering and pass it back into the queue as post processing task the other thread will pick and do post processing shader on it while the thread rendering now renders the next frame.

When post processing thread finishes it could either self display the result on screen or post a show task.

In Vulkan and I assume DX12 too you need also to manage the backbuffers by yourself and synchronization becomes a more tricky task here but solveable when you have knowledge about CPU you may also be able to solve GPU synch.

Note: Different engines may also do different things too but that is a base of how it happen in general

Just like what the graph tells it...

One thread starts the rendering process by doing the culling, then building, then the proper rendering.

As soon as the first thread finishes the culling pass, the second thread starts its rendering process by doing culling. For this second thread to start rendering, it has to wait for the first thread to finish its rendering process.

This is important for all the thread not to do their rendering (or even requiring the GPU to do some stuff) as long as the other thread did not fulfill. If you request GPU computation, then the thread that is rendering will be slow-down. If you render, then you will compromise the final image.

There's no difference between threading graphics or any other engine system. An extremely common approach is a "job system" as described here: https://blog.molecular-matters.com/2015/08/24/job-system-2-0-lock-free-work-stealing-part-1-basics/

[Apparently the information here, despite it being my job, was bad... at least according to whatever drive-by downvoters metric... as I don't want to spread bad information it has been removed]
Multithreaded renderengine is not the same as a multithreaden game engine. To me it means the render system or component and render backend is that multithreaded or not.
DX11 supports multithreaded with defered contex , but the result of that , it doesn't yield much Gains.

With Mantle and DX12 and Vulkan is it possible to use eficently multithreading for rendering and ACE Asynchonoc compute engine
For rendering compute and kopy Engines
That means if your game load and GPU settings are CPU bound. Like a huge amount of drawcalls.
And with these new efficient low overhead API it factor 10 x or so more drawcalls are possible compared to the old API.

So to me it seams do you want to render multithreaded and go wild with drawcalls then you need DX12 or Vulkan.

Where the game engine often already is multithreaded and there is also the same isue you can do it easy way with low or limited gains , or better advanced ways where you can gain a lot.

Multithreading game engine is is something differrent then multithreaded rendering. The last is specific the new API and the whole engine is complex topic to not only multithread tread save but also do it efficently so to scale well on more cores. But also beyond 4 cores. But the render component is part of the larger full engine so there architecture need to fit very well so they are strongly related.

So I would read into DX12 multithreading.
Don't know yet if my DX12 book dives deep into that matter.

There are literally hundreds of possible architectures. I tried to implement a multithreaded engine and here's what I almost achieved (Almost because I had to stop it for working):

What I tried:

- Client creates render queues

- The engine compile renderqueus (read as: optimized sequence of GL commands)

- The render thread execute GL commands without any locking at all (it just need to lock when swapping the pointers to renderqueues.)

Downsides:
- Compiling commands has little overhead, I have pretty low level abstraction, maybe a higher level abstraction allow to generate

already optimized commands, but I can grant you that I had tons of troubles to get something simple like that correctly syncrhonized anyway

Goodsides:
- Renderthread do not block main thread

- Mainthread do not block renderthread

- Reduced lag between input and visualization (seems counterintuitive, but yes using only 2 threads gives better responsiveness even if in theory you have to wait for the queue to be processed, in reality a queue is created almost always before the last queue has finished rendering)

I think a fully real multithreaded engine is possible but would require so much work to squize only few % performance improvement (And note most low-end machines just have 2 cores anyway) that is not worth, achieve something like my solution (3 years of trial and errors and preparation to just create a proof of concept) is already too much effort, and since everyone want to make his engine there's literally no more interest in custom engines anyway.

Wait I search for my thesis slides to give you idea:

here it is:
UrVloNy.png

This is a really simple solution, not really multithreaded, but gives a nice boost in performance and is already much more complex than a single threaded engine.

Multithreading game engine is is something differrent then multithreaded rendering. The last is specific the new API and the whole engine is complex topic to not only multithread tread save but also do it efficently so to scale well on more cores.

As well as the API-specific stuff (draw calls), there's a lot more parts to a renderer. All of the API-independent parts can be multi-threaded -- animating the scene, updating dynamic buffers, streaming new content, frustum culling, occlusion culling, scene traversal, selecting shader permutations for each object, sorting render queues.
All that kind of work can be parallelized in the same way as any other "regular" game engine work, such as with a job system :)

As for the D3D12 deferred contexts, I found that you need to have several hundred draw calls per command list to get any decent savings. So if you've got some kind of rendering queue, you can divide the number of items in the queue by ~300 (rounding up), and then spawn that many jobs to convert each section of the queue into a command list, and then submit the command lists in the appropriate order with a single call to ExecuteCommandLists once that group of jobs has completed.

My multi-threaded engine implementation is fairly simple. It's interleaved, but its effective.

Spawn Thread A - Build terrain in Deferred Context/Command List.

Thread A completes Command list

Spawn Thread B - Build Water and other components

Execute Command List from Thread A in immediate context.

Thread B Completes

Execute Command List from Thread B in immediate context.

By rights, I could add, c and d etc and it would be interleaved. As it's been said, you need a few 100 draw calls (or calls of significant size) to get benefit. I try not to break it down at the moment because this sort of change doesn't give me much more. I do spawn other threads to traverse the quad tree also to do things like occlude assets etc before it gets time to render also (although at the moment I'm rolling the dice here and not checking that the occlusion is done in time, I am yet to add a blocking component).

Indie game developer - Game WIP

Strafe (Working Title) - Currently in need of another developer and modeler/graphic artist (professional & amateur's artists welcome)

Insane Software Facebook

This topic is closed to new replies.

Advertisement