Mutexes and the render thread

Started by
11 comments, last by Beosar 7 years, 7 months ago

Hi,

I have a render thread that uses mutexes for accessing data modified in other threads (vertex buffers etc.). I assured that the operations the other threads perform while they have locked the mutexes are pretty short and therefore should not take longer than several microseconds. However, once or twice per second, the render thread has to wait for the mutexes for several milliseconds because they were not released soon enough. This causes stuttering in my game.

Is there a way to prevent that this happens? I already set the other threads' priorities to "idle" and the render thread's priority to "time critical". I tried to set processor affinities for the threads but that didn't help, too.

However, if I only start 3 instead of 12 (my processor's thread count) other threads, the stuttering disappears, but the world loads much slower...

Advertisement

From "therefore should not take longer than several microseconds" to "the render thread has to wait for the mutexes for several milliseconds" is a factor 1000 time.

If mutexes are really locked few microseconds, other threads couldn't even hold the render thread for 1 millisecond, as 12 * few microsecond, is still around 0.01 milliseconds (ah well, let's make it 0.02 to be safe).

That suggests that either your idea of microseconds locking is not true, or something else is happening. Before you blame locking, it is probably worthwhile to understand where these remaining 1.98 milliseconds or so, are coming from (if "few" means 2, otherwise, it's even more milliseconds).

I found the problem: Thread 1 locks mutex A, gets interrupted by thread 2, thread 2 does some work and then thread 1 is re-scheduled and releases the lock on mutex A. So the mutex was locked for many milliseconds instead of microseconds...

I kind of fixed the problem by changing thread priorities. However, I am curious if there is a better fix for this problem. What if I have more than number of processors threads + the render thread running that all at some point lock mutex A?

I kind of fixed the problem by changing thread priorities. However, I am curious if there is a better fix for this problem.


Don't use mutexes or any locking. Your entire render architecture sounds sub-optimal.

The render thread shouldn't even be trying to use a vertex buffer unless a command to use it has been enqueued. The only code able to enque commands using the buffer should be the sole code that owns (and updates) the buffer. Hence there's not a huge reason to ever lock it.

Further, for streaming buffers that change every frame, you _do not want to change the buffer in place_. That's inefficient at a hardware level even. You want an explicit streaming buffer system that treats a vertex buffer more like a circular buffer. Each update writes into a subsequent range of bytes, never overlapping with the previous few draws. Using either DISCARD or a fence to handle the case where you reach the end of the buffer. This way, the rendering thread can safely read out of the byte ranges that were previously written while the update thread is safely writing into a different byte range.

Sean Middleditch – Game Systems Engineer – Join my team!

You want to avoid mutexes and locking all together. You want to be able to queue work up on the render thread from another thread but you shouldn't be creating renderable resources on other threads (a. because you would need a context on that thread to begin with, and b. because it's not where you render that data to begin with).

Create a way for your other threads to instruct the render thread to create the data it needs to render, then the render thread and just check if that data has been built, if so, render it, if not, check if there's instructions to do so and do it, then render it.

This doesn't solve your problem or answer your question but ultimately I think the source of your question is like @[member='SeanMiddleditch'] said.

Your entire render architecture sounds sub-optimal.

The problem is that I have a voxel game with chunks (similar to Minecraft) and I generate these chunks multi-threaded. I definitely need some sort of map for the chunks and I think I need a lock everytime I access the map. I already create the buffers themselves in the render thread, but I need a lock for accessing the data these buffers should be filled with.

Any idea on how to create the chunks multi-threaded but without locking - and remove them if they are too far away?

The problem is that I have a voxel game with chunks (similar to Minecraft) and I generate these chunks multi-threaded. I definitely need some sort of map for the chunks and I think I need a lock everytime I access the map. I already create the buffers themselves in the render thread, but I need a lock for accessing the data these buffers should be filled with.

Any idea on how to create the chunks multi-threaded but without locking - and remove them if they are too far away?

Generally speaking, look at a copy on write model instead of a lock on write model. This should get rid of any vb/ib locking outside of the renderer at the cost of a memory copy when you have prepared all the data and sent it to the renderer which will then perform the lock/copy/unlock without sharing issues. Playing with thread priorities and such things is generally a bad practice as it may work good on one machine and not another due to different virus scanners, services running etc.

The problem is that I have a voxel game with chunks (similar to Minecraft) and I generate these chunks multi-threaded


Which is not at odds with anything we just said. Minecraft games are not unique or special or different than any other games when it comes to rendering architecture. Their scene graphs will usually be different but their renderers can be identical. A chunk is just a dynamic mesh.

Creating chunks multi-threaded:

- Request new buffer/range (lock-free via atomics and multi-threaded resource creation, or done via asynchronous callback)
- Stream into buffer (asynchronously safe because the buffer/range is unused by any other threads at that moment)
- Update scene graph (this is specific to your code; can be as simple as a lock-free atomic store though)
- Enque command to release old buffer/range (relies on API ref-counting or manual fence gating)

Rendering multi-threaded:

- For each visible chunk, atomic load the current buffer/range, then enque command to draw that

Sean Middleditch – Game Systems Engineer – Join my team!

One thread per core for something this processing intensive

the transaction which is protected by the lock should be as atomic as possible (not interuptable) during the locked period

many programs process in frame waves (scheduled) in a pipeline (on independant chunks of data per core) something like :

prework (game mechanics resolution) T -2

physics & geometries T -1

rendering (misc CPU managing GPU doing much of the work) T 0

other 'secondary) work fit into schedules 'idle' times ... network / AI / interface

thus the data for each (frame) is kept segregated and is worked on by just one core and is passed down the pipeline in one entire chunk with no locking except the flags for the handoff. Processing is spread over time, but cores can be kept fully busy WITH MINIMAL LOCK USE/OVERHEAD

--------------------------------------------[size="1"]Ratings are Opinion, not Fact

In general, if you are working on the level of mutexes, you are approaching multiprocessing at too low a level.

What you almost certainly need here is one or more asynchronous queues. One queue to feed work orders for new chunks, and one queue feeding back the resulting chunks.

That queue may be implemented using mutexes, of course, but the important part is that they only lock accesses to the queue (not its contents), and work on the items in the queue never involves locks.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

This topic is closed to new replies.

Advertisement