Jump to content
  • Advertisement
Sign in to follow this  
Zweistein2

How many Threads to use?

This topic is 2912 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi,

I wonder how many Threads give a speed gain. If i had a quad core mashine: would it be the best if i am running 4 threads at once? Or is it better if i run 128 threads?

I am just talking about real runtime. Dont calculate the synchronisation work on my site in.

I am just wondering if i should finish the work of the first 4 threads before i give new workerthreads to the processor (Worker queue) or if i should just launch all 128 threads at once and wait. What would be faster?

At least 128 Threads at once would mean more memory in use.

Share this post


Link to post
Share on other sites
Advertisement
That depends on how busy the threads are, how often they stall, etc. In general you would be quite hard pressed to completely saturate a core using a single thread, but it all depends.

Share this post


Link to post
Share on other sites
Anything more than one thread per CPU core requires OS to schedule work and reduces efficiency.

The rest is up to application how it schedules work across them.

For certain asynchronous models the thread pool will contain hundreds or thousands of threads, but work is scheduled in such a way that they are reused efficiently.

Share this post


Link to post
Share on other sites
To expand on promit's answer, say you had a thread that all it did was load assets from the hard drive before they were necessary. Disk is pretty slow so it sets up the io then waits for it to complete before going on to do the next task. CPU utilization of this thread is more than likely going to be pretty low, less than 50% of a core more than likely. Spinning up a compute heavy thread to share its processor is a good idea. Spinning up another IO heavy thread means you have just added to bus contention and will more than likely mean a slow down. As a for instance the rule of thumb for compiling I have always seen is 1.5 jobs per core.

Share this post


Link to post
Share on other sites
Quote:
Original post by Zweistein2
I wonder how many Threads give a speed gain. If i had a quad core mashine: would it be the best if i am running 4 threads at once? Or is it better if i run 128 threads?


The more threads you have, the more time the OS scheduler will spend swapping between them. Context switches can be extremely expensive on some platforms. In general, you want to aim to minimize context switching and core contention, which tends to mean a handful of very busy threads.

Quote:
I am just talking about real runtime. Dont calculate the synchronisation work on my site in.


This is an incredibly dangerous perspective. Ignoring synchronization, cache issues, and resource contention is a recipe for massive disaster when working with heavily threaded code. You need to consider all of those factors or you will end up with suboptimal results.

Quote:
I am just wondering if i should finish the work of the first 4 threads before i give new workerthreads to the processor (Worker queue) or if i should just launch all 128 threads at once and wait. What would be faster?


This depends entirely on what the threads are doing, how they do it, what resources they consume, how much cache locality you can depend on between threads, and how much resource contention you have to deal with. If you need to just crank a bunch of work through as many cores as possible as fast as possible, the most efficient way is generally to set things up so you have 1 thread per core and then run each thread at capacity with its own work queue until everything is done.

That's a bit oversimplified, of course, as it assumes your work load is perfectly parallelizable. In reality you will need to consider whether or not you can do cache-local work simultaneously across cores to minimize issues like false sharing; whether or not you will need to synchronize at all between work threads; and how much time you will be spending waiting on things like the I/O bus or memory fetches.


Nothing is quick, simple, or easy when it comes to doing heavy threading, I'm afraid [smile]

Share this post


Link to post
Share on other sites
Another thing to note. Just because everyone here is advocating "1 thread per core", you should really be looking at something closer to "1 software thread per hardware thread". Most intel CPUs support "hyperthreading" to yield 2 threads per core, some spark servers support 32+ threads per core. Your GPU supports hundreds or thousands of threads per "core"(really a GPU core tends to be thousands of smaller 'stream' cores.)

Quote:

Origional post by ApochPiQ
This is an incredibly dangerous perspective. Ignoring synchronization, cache issues, and resource contention is a recipe for massive disaster when working with heavily threaded code. You need to consider all of those factors or you will end up with suboptimal results.

Agreed! Processors are really fast compared to memory. If you are doing only tiny amounts of work per byte, you can eat through your cache and memory bandwidth, slowing down all your threads.

Share this post


Link to post
Share on other sites
Quote:
Original post by ApochPiQ
Nothing is quick, simple, or easy when it comes to doing heavy threading, I'm afraid [smile]
Worth noting that it doesn't *have* to be so painful. Programming languages designed with parallelism in mind (for example, Erlang, or Stackless python) hide many of the complexities away from the programmer, and let you get on with everything else...

Share this post


Link to post
Share on other sites
Quote:
Original post by Zweistein2
I am just wondering if i should finish the work of the first 4 threads before i give new workerthreads to the processor (Worker queue) or if i should just launch all 128 threads at once and wait. What would be faster?
You shouldn't create threads to do one bit of work, and then throw them out and replace them with new ones. You should keep your worker-threads alive, and give them more work to do.

e.g. you make 4 threads, give them 4 pieces of work. Now theres 124 bits of work in the queue. As they finish their work, they grab more from the queue until there's 0 work in the queue.

Share this post


Link to post
Share on other sites
Quote:
Original post by swiftcoder
Worth noting that it doesn't *have* to be so painful. Programming languages designed with parallelism in mind (for example, Erlang, or Stackless python) hide many of the complexities away from the programmer, and let you get on with everything else...



Yeah, there are tools that can make it substantially easier, sure - but I maintain that if you don't have a solid understanding of what issues those tools actually help you deal with, you'll still end up with suboptimal results. Certainly less suboptimal than screwing it up in, say, C++, but still suboptimal [smile]

Share this post


Link to post
Share on other sites
Also, it's not about how many threads you have in total, it's about how many threads you have running or ready to run at any one time.

1 software thread per hardware thread will not be optimal if those threads make blocking calls.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!