Unity Multithreading my game engine -- slower than expected performance

This topic is 3613 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

My game engine, vastly simplified, does two things: simulate the game world, then render the game world. Note that I'm including getting input and some other subsystems as part of the "simulate" step. Based on this forum discussion, I decided to try moving the rendering to a second thread. Unfortunately, the performance is worse running in multithreaded mode than in single threaded mode on my dual core CPU... by a fair margin. Let me explain in a little more detail what's going on. Here's what the execution looks like in single threaded mode:
|---sim---|---sync---|-------render-------|
and repeat. Note: the "sync" represents the time spent sending the simulation updates to the rendering system. Here's what the execution looks like in multi-threaded mode:
main thread:   |---sim---|---wait---|---sync---|
render thread: |-------render-------|---wait---|
main thread:   |---sim---|---wait---|---sync---|
render thread: |-------render-------|---wait---|
The way I'm implementing the waiting is with SDL (libsdl.org) semaphores. I'm on a dual core linux 32-bit system. Also, just another data point, if I make the simulation much simpler and reduce the scene to something very simple (which reduces sync and render times), I can get upwards of 400 FPS out of the multithreaded mode and maybe 600 FPS from single threaded mode. Thanks.

Share on other sites
Do any of your steps do anything extra in multithreaded mode that they don't need to do in single-threaded mode (other than the semaphores)?

It doesn't seem like semaphores alone would cause that much penalty, unless you're accidentally setting up a situation where the sim step holds onto a semaphore that the render step wants (or vice versa):

|---sim----|-------wait--------|---sync---||---wait---|------render-------|---wait---|or maybe|---sync---|------wait---------|---sim----||---wait---|-----render--------|---wait---|

Share on other sites
I'll try to quickly sketch out the "right" way to do a multithreaded game engine:

Game logic runs continuously, pushing deltas to a message queue after each step.
Renderer runs continuously, grabbing and applying deltas before each frame.

This requires the game logic to use an entirely independent data set, and be completely decoupled from the renderer -- which is as it should be. The game logic will determine how objects in the game world move around, then send any changes to the renderer, which rearranges the scene as necessary.

No waiting! Unless you want to cap ticks per second for whatever reason. The data structure you use for IPC obviously needs to be threadsafe, which can be accomplished by any number of techniques. The lockless queue (see Google) is probably your friend in this case.

For details, take a look at this thread on the OGRE forums, and pay careful attention to what xavier says:
http://www.ogre3d.org/phpBB2/viewtopic.php?t=26496

It eventually gets pretty deep into the implementation details, including preallocation and reuse of message objects.

Share on other sites
Quote:
 Do any of your steps do anything extra in multithreaded mode that they don't need to do in single-threaded mode (other than the semaphores)?

No, even the syncing uses the same code.

Quote:
 it doesn't seem like semaphores alone would cause that much penalty

I agree. The high performance that I can get with no sim and very little data to sync or render indicates to me that the semaphores themselves probably add little overhead.

I'll look at my code a little more closely this afternoon to make sure I don't have a simple error somewhere that's causing excessive waiting beyond what I showed in the ASCII diagram.

Share on other sites
Quote:
 Original post by venzonmain thread: |---sim---|---wait---|---sync---|render thread: |-------render-------|---wait---|The way I'm implementing the waiting is with SDL (libsdl.org) semaphores. I'm on a dual core linux 32-bit system. Also, just another data point, if I make the simulation much simpler and reduce the scene to something very simple (which reduces sync and render times), I can get upwards of 400 FPS out of the multithreaded mode and maybe 600 FPS from single threaded mode.

Why are you waiting. The code should look something like this:
main thread:   |-sim1-|-sim2-|-sim3-|-sim4-|-sim5-|-sim6|render thread: |-------render0-------|-------render3-------|
Simply put, renderer takes latest complete simulation step, and renders that.

You may need to duplicate the state, one that's being simulated, and another which is being rendered.

Whether you pass the data between threads, or use read-only shared state is a matter of choice.

Of course, it's perfectly possible you have trivial coding error.

Share on other sites
A quick update. After restarting my system I realized that my earlier multithreaded performance data was invalid because I had a task running in the background on one of the cores that wasn't as idle as I thought. Woops. Now I see 90% utilization on one core and 30% on the other in multithreading mode, with 70 FPS. Single threading still shows high 60s. This is more in line with my expectations, since the rendering at the moment takes much longer than the simulation or the sync. Put pseudo-mathmatically, I expect the time per rendered frame (with my current architecture) to be (assuming Tsim < Trender):

Tsingle = Tsim + Tsync + Trender
Tmulti = Tsync + Trender

so:

Tmulti - Tsingle = Tsync - Tsim

Antheus and drakostar, a note on my simulation: I use a fixed timestep of 10 ms (game time). Each sim step will do multiple 10 ms updates until the game time matches wall clock time (which doesn't take very long because the sim is quick). After that it sits and waits until the render finishes so it can send it the latest data, then it repeats and does more updates until the game time matches wall clock time again. So, the fixed timestep of the sim means it will be doing waiting in one form or another. But, I think I understand the essence of your points, which is that I can get better performance by eliminating that sync portion that ties up both threads and running things continuously. I'll definitely look into that.

Share on other sites
Alright, I added a second buffer to the render thread and eliminated the lock-step sync, so now it looks like this:

main thread:   |---sim---|---sync---|---wait---|render thread: |-------------render------------|

and repeat.

My processor usage is up to:
100% core usage from the render thread
34% core usage from the main thread

This is nifty because now I should be able to add considerable complexity to the simulation side without affecting the performance in multithreaded mode at all. Thanks for the help guys!

As a side note, my game is a racing simulation (vdrift.net) so I have a handy way to scale simulation complexity that won't tick off single-threaded users: allow racing against more AI cars if you have more cores!

1. 1
2. 2
3. 3
4. 4
Rutin
16
5. 5

• 12
• 9
• 12
• 37
• 12
• Forum Statistics

• Total Topics
631416
• Total Posts
2999968
×