RenderQueue + Multithreading in OpenGL?

Started by
6 comments, last by hgoel0974 9 years, 1 month ago

Although this isn't particularly specific to OpenGL, it's in the context of my game engine which currently only deals with OpenGL so...

I was thinking about an approach to the rendering system, similar to how I think Command Buffers in the upcoming APIs are supposed to work.

My game engine has a multithreaded game loop where each thread deals with specific functions (like one for updating physics, another for animations, another for general updates and another for world management, all loops are fixed timestep), now the problem is, being in C# and OpenTK, the rendering is tied to the WinForm Paint function, which sort of bugs me, I'd like to be able to make every thing totally independent. So I came up with the following idea:

Have a Render Queue where a separate Render thread writes its render commands to.

The commands on the render queue (up till swapbuffers) are then executed on every refresh. If the system is having trouble keeping up and a frame is missed, then the commands relevant to it are dropped from the Queue. With this setup I think it should take out a lot of the complexity of dealing with variable rendering time steps while also making it easy to transition to Vulkan and DX12 when they're released.

But just before I go in and start adjusting the API to work like this, I wanted to know if there were any immediately obvious flaws with such a system that I'm overlooking.

Advertisement

Having a render command buffer (or render queue as you called them) for each thread that potentially submits render commands is the right thing to do.

Maybe I didn't fully understand your command buffer execution but when *exactly* are you going to process the individual command buffers? When the paint funciton is getting called? Where are you going to submit the OpenGL calls? Keep in mind that each OpenGL Context is bound to the thread where the context got created.

You also have to keep rendering order in mind.

Ideally you'd want to have a designated render thread whose only purpose is to process command buffers that have been submitted.

To ensure that the order in which the command buffers is the same order in which the command buffers are getting processed by the render thread you could implement some kind of command buffer dispatcher who receives command buffers to process in a defined order. Once all command buffers from all threads have been dispatched you can kick off the render thread which will immediately start to get the command buffers from the command buffer dispatcher.

To improve performance you could also double buffer the command buffers and the command buffer dispatcher so that you can start adding new render commands while the render thread is still busy processing the render commands that have been submitted in the previous frame.

Visit my blog, follow me on twitter or check out my bitbucket repositories.


being in C# and OpenTK, the rendering is tied to the WinForm Paint function
Wait what? I'm pretty sure that isn't/shouldn't be necessary. Arent you doing some specific thing? (combining windows widgets with OGL draw surface). Have in mind that thats a library that works on Linux and OSX, so WinForm requirements must be because some specific thing you're doing.

http://www.opentk.com/doc/chapter/0

From that page it seems GameWindow provides a SwapBuffers function.

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

One possible problem is that thread-per-subsystem scales rather poorly and may have synchronization/efficiency issues. For example, if "skinning" needs "physics" and "draw" needs "skinning", but "physics" needs "ai", and they all run in a separate thread then they all basically run lockstep, single-threaded (most threads are just waiting for some other thread to produce something they need before they can work, and then there's only one doing something).

On the other hand, having 4 threads compete over, say, two cores is no good. Similarly, running 4 threads on a 6-core machine isn't good.

I prefer pushing tasks (with dependencies) to a thread-per-core-minus-one-atleast-one sized thread pool. This scales really well both upwards and downwards to any number of cores, and things actually happen in parallel, most of the time. Yes, you still have sync points that suck, but you can usually run much of the stuff between them in parallel, and with some thought, you can design your work queues so "useless time" (while a stage is not ready) can be used to do background tasks.

You most likely have 1-2 moderately busy extra threads running anyway which are out of your control (e.g. audio mixer, GL server), hence having "minus one" on the total number of cores is probably a wise approach (this is a somewhat "unscientific" approach, but works for me).

To keep synchronization low, instead of submitting commands to the render thread's queue (this will require a lot of atomic operations or locking) you most probably want to do something similar as OpenGL is already doing in a very limited way with glDrawElementsIndirect, and what Vulkan will be doing much more elaborately:

Have a thread (any worker thread, or rather any number of them!) build a whole large buffer (or several) of several hundred or so commands, and submit that whole thing to the render queue from where the single thread that owns the GL context submits it with a single API call.

With "dropping commands", you hopefully mean dropping a complete frame worth of commands (I wasn't sure, initially it sounded to me like if e.g. te physics system can't keep up, the respective draws are skipped and only static geometry is rendered, that would be... catastrophic).

Having a render command buffer (or render queue as you called them) for each thread that potentially submits render commands is the right thing to do.

Maybe I didn't fully understand your command buffer execution but when *exactly* are you going to process the individual command buffers? When the paint funciton is getting called? Where are you going to submit the OpenGL calls? Keep in mind that each OpenGL Context is bound to the thread where the context got created.

You also have to keep rendering order in mind.

Ideally you'd want to have a designated render thread whose only purpose is to process command buffers that have been submitted.

To ensure that the order in which the command buffers is the same order in which the command buffers are getting processed by the render thread you could implement some kind of command buffer dispatcher who receives command buffers to process in a defined order. Once all command buffers from all threads have been dispatched you can kick off the render thread which will immediately start to get the command buffers from the command buffer dispatcher.

To improve performance you could also double buffer the command buffers and the command buffer dispatcher so that you can start adding new render commands while the render thread is still busy processing the render commands that have been submitted in the previous frame.

Currently, yes, when the Paint function is called, this is on the thread on which the context was created, so no problems there. Double buffering would be an interesting idea.


Wait what? I'm pretty sure that isn't/shouldn't be necessary. Arent you doing some specific thing? (combining windows widgets with OGL draw surface). Have in mind that thats a library that works on Linux and OSX, so WinForm requirements must be because some specific thing you're doing.

Yes, it isn't necessary, I'm just doing it like that because I don't want to have to rely on the OpenTK GameWindow. The game engine is designed to have the underlying renderer swapped out easily, while still letting the higher level parts handle the window itself, the way this is done is by requiring the renderer to provide a Control object.

One possible problem is that thread-per-subsystem scales rather poorly and may have synchronization/efficiency issues. For example, if "skinning" needs "physics" and "draw" needs "skinning", but "physics" needs "ai", and they all run in a separate thread then they all basically run lockstep, single-threaded (most threads are just waiting for some other thread to produce something they need before they can work, and then there's only one doing something).

On the other hand, having 4 threads compete over, say, two cores is no good. Similarly, running 4 threads on a 6-core machine isn't good.

I prefer pushing tasks (with dependencies) to a thread-per-core-minus-one-atleast-one sized thread pool. This scales really well both upwards and downwards to any number of cores, and things actually happen in parallel, most of the time. Yes, you still have sync points that suck, but you can usually run much of the stuff between them in parallel, and with some thought, you can design your work queues so "useless time" (while a stage is not ready) can be used to do background tasks.

You most likely have 1-2 moderately busy extra threads running anyway which are out of your control (e.g. audio mixer, GL server), hence having "minus one" on the total number of cores is probably a wise approach (this is a somewhat "unscientific" approach, but works for me).

To keep synchronization low, instead of submitting commands to the render thread's queue (this will require a lot of atomic operations or locking) you most probably want to do something similar as OpenGL is already doing in a very limited way with glDrawElementsIndirect, and what Vulkan will be doing much more elaborately:

Have a thread (any worker thread, or rather any number of them!) build a whole large buffer (or several) of several hundred or so commands, and submit that whole thing to the render queue from where the single thread that owns the GL context submits it with a single API call.

With "dropping commands", you hopefully mean dropping a complete frame worth of commands (I wasn't sure, initially it sounded to me like if e.g. te physics system can't keep up, the respective draws are skipped and only static geometry is rendered, that would be... catastrophic).

The thread pooling setup would be an interesting idea, the thing is that some of the functions aren't heavy enough to require an entire core devoted solely to a thread for it.

I guess I slightly misunderstood how the command buffers are supposed to work in Vulkan, but I see how having each thread maintain its own buffer, and then pushing that to the render queue would reduce synchronization.

Also, yes, by dropping commands, I do plan to drop an entire frame worth of commands, skipping out on physics objects, as you said, would be very bad.

pushing that to the render queue would reduce synchronization.

A render-queue is for sorting objects for best draw order. It is a high-level construct that is used to reduce state changes.
You are only talking about command buffers. A render-queue is used before a command buffer to sort draw calls and reduce shader swaps, texture swaps, near-to-far, far-to-near, etc.


Anyway, what you describe is mostly a standard multi-threaded-render set-up. There are plenty of resources online regarding this, and it is heavily industry-proven.

If implemented correctly, you really don’t need to worry much about synchronization issues and overhead. You wouldn’t be locking every time you added a command, you would be locking a command buffer once, filling it, unlocking, and signalling that it is read to use.

There is also little to gain by doing this across multiple threads. 1 thread fills and 1 thread eats. At tri-Ace a 3rd thread does pre-render set-up to prepare for rendering, but you don’t need to worry about being that advanced for quite a while.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Sorry for the late reply.

Thanks for your advice L.Spiro.

I did mean to talk about Command Buffers, I just suck at terminology :P

I understand that there isn't much to be gained on the OpenGL side, but I think this should help speed up my resource loading since that data will have more time to get loaded before it's needed (then again, since I recently learned how designing the model format so it could be put straight on to the GPU without much parsing would help load times, not sure how much this will matter with that in place as well).


then again, since I recently learned how designing the model format so it could be put straight on to the GPU without much parsing would help load times, not sure how much this will matter with that in place as well

You still want to be able to load the model on a background thread, and then be able to have the upload to the GPU take place on the main (render) thread.

That might look like a UPLOAD command in your GPU buffer, with a read-only reference to the data it needs to upload. That way you can issue the upload from the background thread, and asynchronously upload the data to the GPU between frames.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Yeah, that's how I'm doing it.

This topic is closed to new replies.

Advertisement