Jump to content
  • Advertisement
Sign in to follow this  
Klutzershy

Designing an efficient multithreading architecture

This topic is 2123 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm designing an engine for my big 3D game project, and I want to make sure everything is very scalable to processors with many cores, without having more threads than necessary spawned at a time.

 

This is the architecture I'm considering right now in terms of different threads:

  • 1 scheduling thread
    • Runs the main loop
    • Spawns work and I/O jobs
    • Sends draw/compute calls to the GPU
  • 1 I/O thread
    • Blocks on calls to fread and fwrite
    • Spawns work jobs for decoding
  • 1 sound thread
    • Runs from within OpenAL or satisfies SDL audio callbacks
  • n worker threads, where n = ncores - 3
    • Run serial work jobs (embarassingly parallel jobs will be run on the GPU)

While this makes sense to me for processors like Intel i7's or AMD FX's which generally have more than 4 (logical) cores, for a 4-core processor like an i5, there is only one worker thread.

Should the scheduling thread also be able to run work jobs?  If so, is it safe enough to have any thread be able to send draw/compute calls to the GPU (using OpenGL 4)?

Share this post


Link to post
Share on other sites
Advertisement

You probably know a lot more than me, but it may be worth pointing out that every time I see someone ask about multithreading the overwhelming reply is "Don't!". As unless you really know about all the pitfalls it can bring it can just be more trouble than it's worth.

however maybe you do know your stuff, in which case good luck :)

Share this post


Link to post
Share on other sites

You probably know a lot more than me, but it may be worth pointing out that every time I see someone ask about multithreading the overwhelming reply is "Don't!". As unless you really know about all the pitfalls it can bring it can just be more trouble than it's worth.

however maybe you do know your stuff, in which case good luck smile.png

Oh, I know what I'm getting myself into.  I know all the issues about memory racing and whatnot across threads, so I'm going to minimize the number of times that threads need to synchronize, and use locks properly when I need to.

Share this post


Link to post
Share on other sites

If you have multiple OpenGL contexts (one for each thread, with sharing of objects setup in between them) you will be able to make OpenGL calls from several threads simultaneously, but you must yourself ensure you're not eg. updating the same OpenGL object in one thread while rendering with it in the other. I also wouldn't be suprised if you discover more driver bugs that way, compared to singlethreaded use, but I don't have personal experience of that.

 

If your jobs are the kind of "update culling", "animate n entities", definitely run them also in the main thread, especially if your frame processing cannot proceed further without completing them first.

Share this post


Link to post
Share on other sites
If your jobs are the kind of "update culling", "animate n entities", definitely run them also in the main thread, especially if your frame processing cannot proceed further without completing them first.

Do you think it's better to run jobs on the main thread or instead spawn an extra worker thread and let the main thread sleep while there are jobs still pending?

Share this post


Link to post
Share on other sites

I'd believe that is best answered by profiling, but each time a thread goes to sleep, it may not wake up as timely as you'd want due to the OS scheduling. That goes for both the main thread & workers. Therefore my gut feeling is against the extra worker thread.

Share this post


Link to post
Share on other sites

If your jobs are the kind of "update culling", "animate n entities", definitely run them also in the main thread, especially if your frame processing cannot proceed further without completing them first.

Do you think it's better to run jobs on the main thread or instead spawn an extra worker thread and let the main thread sleep while there are jobs still pending?

Creating threads has a cost. Why pay it when you don't have to?

I'm designing an engine for my big 3D game project, and I want to make sure everything is very scalable to processors with many cores, without having more threads than necessary spawned at a time.

This is the architecture I'm considering right now in terms of different threads:

  • 1 scheduling thread
    • Runs the main loop
    • Spawns work and I/O jobs
    • Sends draw/compute calls to the GPU
  • 1 I/O thread
    • Blocks on calls to fread and fwrite
    • Spawns work jobs for decoding
  • 1 sound thread
    • Runs from within OpenAL or satisfies SDL audio callbacks
  • n worker threads, where n = ncores - 3
    • Run serial work jobs (embarassingly parallel jobs will be run on the GPU)
While this makes sense to me for processors like Intel i7's or AMD FX's which generally have more than 4 (logical) cores, for a 4-core processor like an i5, there is only one worker thread.
Should the scheduling thread also be able to run work jobs? If so, is it safe enough to have any thread be able to send draw/compute calls to the GPU (using OpenGL 4)?

I'd have suggest having the number of worker threads be equal to the number of cores or maybe even double that. (Forum member Frob has said in his experience 2 times the number of virtual cores is usually a good way to go).

The reason you want as many worker threads as cores even though you have other threads, is the worker threads probably have a different workload than the task specific threads (or whatever the word for that is). Those threads may occasionally sleep, block on an IO signal, and under use a single core. You want your worker threads to saturate the remaining resources, so it's better to have too many worker threads some of the time, than too few at any time. You can, however, give task specific threads higher priority. Edited by King Mir

Share this post


Link to post
Share on other sites

fread/fwrite are blocking wrappers around the OS's internal non-blocking file-system API.

Instead of using a non-blocking-API, wrapped in a blocking API, wrapped in a thread to make it non-blocking, you should just use the native OS APIs wink.png

You still might want an "IO" thread for running decompression, or alternatively just treat decompression as a regular job for your worker threads.

 

The threads spawned by your audio middleware probably spend most of their time sleeping, so I wouldn't allocate them an entire core. I'd probably ignore the sound threads when figuring out how many worker threads to spawn.

 

I personally use my "main thread" as a worker thread too. Whenever any of my threads has nothing to do (e.g. it has to wait for the results of another thread before it can continue), then they make themselves useful by popping jobs from the job queue and doing some work. I basically have a "WaitFor..." busy loop, that continually checks if the condition has been met to exit, else tries to run a job, else after enough tries with no jobs in the queue it yields or sleeps.

 

Regarding the GPU, on PC it's probably still the fastest choice to just have one thread as the dedicated GPU thread. Other threads can perform rendering work -- e.g. doing frustrum culling, building render queues, performing state sorting or redundant state-change removal, etc... but only one thread actually draws things.

Multi-threaded drawing is possible, but AFAIK, the drivers do not perform very well at the moment.

 

Regarding embarrassingly parallel jobs -- in order to move these off to the GPU, you also need the consumers of those jobs to be ok with extremely long latencies. It's not possible to get short CPU->GPU->CPU latencies on PC without destroying overall performance.

Edited by Hodgman

Share this post


Link to post
Share on other sites

I would distribute them as follows:

 

  • Core 1—Several resource-light threads.
    • Sound—Ticks only a few times per second to keep sound buffers filled, requiring very few resources.
    • Input (keyboard/mouse/etc.)—High-priority but mostly sleeping until a button is pressed, requiring very few resources.
    • Network thread—Medium priority but still mostly just waiting for events, which generally amount to at-most 20-per-second in heavy times.
    • 1 low-priority worker thread for background loading etc.—With sound, input, and networking all mostly in a sleep or wait state (and only waking to do very quick tasks before going back to waiting), there is still enough core left for a low-priority worker for any kind of task that is not very time-sensitive.
  • Core 2—Logic.
    • Game thread—Reads queued inputs, performs game logic, performs frustum culling, sorting, and submits render commands.
    • 1 worker thread—The game thread will have the heaviest load when it needs to update game logic, which is typically bound to be as infrequent as possible (and in racing games in which it can be more than 100 times per second, the load is balanced anyway so that not much logic actually takes place) and on the down-time the game thread simply interpolates object positions for re-submission to the render thread.  There is typically enough CPU left over for a worker thread.  It can either be constantly running at a medium-low priority or it can be the same priority and forced awake when the game thread is waiting and forced to wait when the game thread is awake.
  • Core 3—Rendering and worker scheduling.
    • Render thread—Sends render commands to the GPU.
    • Worker scheduler—Runs when the render thread is waiting, waits when the render thread is awakened.  It takes very little time to read over a list of requested tasks and awaken worker threads.
  • Cores 4 to Total-3—Anything else that needs to be done.
    • 1 worker thread—Extra cores used for extra work.  File loading, decoding, decompressing, whatever.  Can change depending on the game.  Each thread is high-priority, but waits until a job is there for it to do.

With this plan you still have multiple workers on 4-core systems, and the main 2 components (rendering and game logic) each basically have their own cores—they share it with a worker thread but that thread leaves them alone while they are active (though this should be handled with care on the game thread, since it doesn’t necessarily have to have down-time).

 

Also, don’t “spawn” threads, awaken them.  They should already exist and just be idling in a waiting state, waiting for an event to set them in motion.

And a “wait” state is not a “sleep” state.

 

 

L. Spiro

Edited by L. Spiro

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!