• Advertisement
Sign in to follow this  

Unity Multithreading my game engine -- slower than expected performance

This topic is 3458 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

My game engine, vastly simplified, does two things: simulate the game world, then render the game world. Note that I'm including getting input and some other subsystems as part of the "simulate" step. Based on this forum discussion, I decided to try moving the rendering to a second thread. Unfortunately, the performance is worse running in multithreaded mode than in single threaded mode on my dual core CPU... by a fair margin. Let me explain in a little more detail what's going on. Here's what the execution looks like in single threaded mode:
|---sim---|---sync---|-------render-------|
and repeat. Note: the "sync" represents the time spent sending the simulation updates to the rendering system. Here's what the execution looks like in multi-threaded mode:
main thread:   |---sim---|---wait---|---sync---|
render thread: |-------render-------|---wait---|
and repeat. Note that the main thread's simulation step finishes much faster than the rendering, and then has to wait for the rendering to finish before the main thread starts a sync (sending data to the render system). When the main thread is doing a sync, the render thread has to wait. Performance data indicates that the single threaded mode is using 100% of one core, as expected -- and I get about 65 FPS. In multithreaded mode, each thread uses a little over 50% of its core, spending the rest of the time waiting as per the diagram above -- and I get about 45 FPS. I was expecting the multithreaded version to do a fair amount of waiting due to the fact that right now my simulation of the game world isn't very CPU intensive, and also that my sync step is poorly optimized and takes longer than it should. I was expecting that the multithreaded version would be maybe marginally faster than the single threaded version, and then as I added complexity to the sim and sped up the sync, the multithreaded version would start to really out-pace the single threaded version. However, at this point the multithreaded version is so much slower than the single threaded version that I'm wondering if I've done something terribly, terribly wrong in the way I architected my multithreaded version. So, I thought I'd ask the gamedev folks: Am I doing something fundamentally wrong with my multithreaded architecture? Here it is again:
main thread:   |---sim---|---wait---|---sync---|
render thread: |-------render-------|---wait---|
The way I'm implementing the waiting is with SDL (libsdl.org) semaphores. I'm on a dual core linux 32-bit system. Also, just another data point, if I make the simulation much simpler and reduce the scene to something very simple (which reduces sync and render times), I can get upwards of 400 FPS out of the multithreaded mode and maybe 600 FPS from single threaded mode. Thanks.

Share this post


Link to post
Share on other sites
Advertisement
Do any of your steps do anything extra in multithreaded mode that they don't need to do in single-threaded mode (other than the semaphores)?

It doesn't seem like semaphores alone would cause that much penalty, unless you're accidentally setting up a situation where the sim step holds onto a semaphore that the render step wants (or vice versa):


|---sim----|-------wait--------|---sync---|
|---wait---|------render-------|---wait---|

or maybe

|---sync---|------wait---------|---sim----|
|---wait---|-----render--------|---wait---|

Share this post


Link to post
Share on other sites
I'll try to quickly sketch out the "right" way to do a multithreaded game engine:

Game logic runs continuously, pushing deltas to a message queue after each step.
Renderer runs continuously, grabbing and applying deltas before each frame.

This requires the game logic to use an entirely independent data set, and be completely decoupled from the renderer -- which is as it should be. The game logic will determine how objects in the game world move around, then send any changes to the renderer, which rearranges the scene as necessary.

No waiting! Unless you want to cap ticks per second for whatever reason. The data structure you use for IPC obviously needs to be threadsafe, which can be accomplished by any number of techniques. The lockless queue (see Google) is probably your friend in this case.

For details, take a look at this thread on the OGRE forums, and pay careful attention to what xavier says:
http://www.ogre3d.org/phpBB2/viewtopic.php?t=26496

It eventually gets pretty deep into the implementation details, including preallocation and reuse of message objects.

Share this post


Link to post
Share on other sites
Quote:
Do any of your steps do anything extra in multithreaded mode that they don't need to do in single-threaded mode (other than the semaphores)?


No, even the syncing uses the same code.

Quote:
it doesn't seem like semaphores alone would cause that much penalty


I agree. The high performance that I can get with no sim and very little data to sync or render indicates to me that the semaphores themselves probably add little overhead.

I'll look at my code a little more closely this afternoon to make sure I don't have a simple error somewhere that's causing excessive waiting beyond what I showed in the ASCII diagram.

Share this post


Link to post
Share on other sites
Quote:
Original post by venzon

main thread:   |---sim---|---wait---|---sync---|
render thread: |-------render-------|---wait---|


The way I'm implementing the waiting is with SDL (libsdl.org) semaphores. I'm on a dual core linux 32-bit system. Also, just another data point, if I make the simulation much simpler and reduce the scene to something very simple (which reduces sync and render times), I can get upwards of 400 FPS out of the multithreaded mode and maybe 600 FPS from single threaded mode.


Why are you waiting. The code should look something like this:

main thread: |-sim1-|-sim2-|-sim3-|-sim4-|-sim5-|-sim6|
render thread: |-------render0-------|-------render3-------|
Simply put, renderer takes latest complete simulation step, and renders that.

You may need to duplicate the state, one that's being simulated, and another which is being rendered.

Whether you pass the data between threads, or use read-only shared state is a matter of choice.

Of course, it's perfectly possible you have trivial coding error.

Share this post


Link to post
Share on other sites
A quick update. After restarting my system I realized that my earlier multithreaded performance data was invalid because I had a task running in the background on one of the cores that wasn't as idle as I thought. Woops. Now I see 90% utilization on one core and 30% on the other in multithreading mode, with 70 FPS. Single threading still shows high 60s. This is more in line with my expectations, since the rendering at the moment takes much longer than the simulation or the sync. Put pseudo-mathmatically, I expect the time per rendered frame (with my current architecture) to be (assuming Tsim < Trender):

Tsingle = Tsim + Tsync + Trender
Tmulti = Tsync + Trender

so:

Tmulti - Tsingle = Tsync - Tsim

Antheus and drakostar, a note on my simulation: I use a fixed timestep of 10 ms (game time). Each sim step will do multiple 10 ms updates until the game time matches wall clock time (which doesn't take very long because the sim is quick). After that it sits and waits until the render finishes so it can send it the latest data, then it repeats and does more updates until the game time matches wall clock time again. So, the fixed timestep of the sim means it will be doing waiting in one form or another. But, I think I understand the essence of your points, which is that I can get better performance by eliminating that sync portion that ties up both threads and running things continuously. I'll definitely look into that.

Share this post


Link to post
Share on other sites
Alright, I added a second buffer to the render thread and eliminated the lock-step sync, so now it looks like this:

main thread:   |---sim---|---sync---|---wait---|
render thread: |-------------render------------|

and repeat.

My processor usage is up to:
100% core usage from the render thread
34% core usage from the main thread

This is nifty because now I should be able to add considerable complexity to the simulation side without affecting the performance in multithreaded mode at all. Thanks for the help guys!

As a side note, my game is a racing simulation (vdrift.net) so I have a handy way to scale simulation complexity that won't tick off single-threaded users: allow racing against more AI cars if you have more cores!

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
  • Advertisement
  • Popular Now

  • Advertisement
  • Similar Content

    • By Atwo Studios
       
      Hey guys,

      Anthony here from Atwo Studios bringing you some new updates for the new year!
      In this video I go over our game ROY, the new games and some general updates to the company!

      If you have not checked out ROY feel free to give it a try! Many people have said they enjoyed the game thus far!
      ROY: https://goo.gl/o6JJ5P
       
    • By Affgoo
      https://play.google.com/store/apps/details?id=com.NE.Alien
      still a lot of work to do, but its pretty stable  please let me know what you think <3
      Atlas Sentry is a game of destroy everything. Using your turret, simply swivel and shoot your way to victory, upgrading your weapons to unleash destruction on the variety of spaceships. The bigger your combo’s the more score you get! Earn silver as you play and then purchase new weapons and abilities to better deal with your enemy. Different enemies use different tactics and weapons, work out your own priorities in their destruction order. 

      Features: 
      **2 different game modes 
      **A level select mode with 20 difficult levels including a final boss, can you defeat it? **Arcade mode of endless destruction, how long will you last? 
      **High scores to compete against others, see who can take the top spot. 
       
    • By Chamferbox
      Chamferbox, a mini game asset store has just opened with some nice game assets, 
      Here you can find a free greek statue asset 

      Also check their dragon, zombie dragon and scorpion monster out:



      They're running the Grand Opening Sale, it's 30% off for all items, but for gamedev member, you can use this coupon code:
      GRANDOPEN
      to get 50% off prices What are you waiting for, go to
      http://chamferbox.com
      and get those models now!

      View full story
    • By Dafu
      FES Retro Game Framework is now available on the Unity Asset Store for your kind consideration!
      FES was born when I set out to start a retro pixel game project. I was looking around for an engine to try next. I tried a number of things, from GameMaker, to Fantasy Consoles, to MonoGame and Godot and then ended up back at Unity. Unity is just unbeatable in it's cross-platform support, and ease of deployment, but it sure as heck gets in the way of proper retro pixel games!
      So I poured over the Unity pipeline and found the lowest levels I could tie into and bring up a new retro game engine inside of Unity, but with a completely different source-code-only, classic game-loop retro blitting and bleeping API. Months of polishing and tweaking later I ended up with FES.
      Some FES features:
      Pixel perfect rendering RGB and Indexed color mode, with palette swapping support Primitive shape rendering, lines, rectangles, ellipses, pixels Multi-layered tilemaps with TMX file support Offscreen rendering Text rendering, with text alignment, overflow settings, and custom pixel font support Clipping Sound and Music APIs Simplified Input handling Wide pixel support (think Atari 2600) Post processing and transition effects, such as scanlines, screen wipes, screen shake, fade, pixelate and more Deploy to all Unity supported platforms I've put in lots of hours into a very detail documentation, you can flip through it here to get an better glimpse at the features and general overview: http://www.pixeltrollgames.com/fes/docs/index.html
      FES is carefully designed and well optimized (see live stress test demo below). Internally it uses batching, it chunks tilemaps, is careful about memory allocations, and tries to be smart about any heavy operations.
      Please have a quick look at the screenshots and live demos below and let me know what you think! I'd love to hear some opinions, feedback and questions!
      I hope I've tickled your retro feels!



      More images at: https://imgur.com/a/LFMAc
      Live demo feature reel: https://simmer.io/@Dafu/fes
      Live blitting stress test: https://simmer.io/@Dafu/fes-drawstress
      Unity Asset Store: https://www.assetstore.unity3d.com/#!/content/102064

      View full story
    • By DevdogUnity

      Ho ho ho
      Tis the season of Christmas surprises, and we have a awesome one for you! 🎅  
      Sponsored by all your favorite Unity Asset Store developers, Nordic Game Jam, Pocket Gamer Connects, and co-hosted by Game Analytics, we (Joris and I – Devdog) are launching the second edition of our yearly Christmas Giveaway Calendar for all Unity game developers!
      You can already now sign up right here.
       
      So what’s this all about?
      For the past weeks, we’ve been collecting sponsored gifts related to Unity (asset vouchers, product keys, conference tickets etc.), and throughout each day of December leading up to Christmas Day on the 25th, we will be sending out these sponsored gifts as early gamedev Christmas presents via e-mail to hundreds of lucky winners.
      The total prize pool is at $35,000, with over 1200 presents donated by the awesome sponsors!
       
      Merry Christmas from Devdog, Game Analytics, and every single one of the sponsors.

      View full story
  • Advertisement