SetThreadAffinityMask and performance

Started by
8 comments, last by janta 17 years, 10 months ago
Hi! I've quickly read the other gdnet threads dealing with QPF/QPC and multicore issues. What seems to be widely recommended is to use SetThreadAffinityMask to enforce the main thread to be always runned by the same cpu/core. Now my question is: Say we have two threads T1 (main thread) and T2, and a dual core CPU with C1 and C2 (cores) I believe the underlying OS (winXP) manages what core runs what thread quite transparently to the programmer and it will happen that C1 and C2 will both run T1 or T2. Now what happens if, thanks to SetThreadAffinityMask, I enforce T1 to be run only by C1 ? I believe that would not be reciprocal, that is, T1 would be run only by C1 but C1 would NOT ONLY run T1. Thats why I'm concerned (or say, curious) about what happens if the OS schedules C1 to run T2 ? To me it seems that: 1) T1 does not receive CPU time during that time slice (bad!) 2) the C2 time slice is wasted (though I doubt not the OS will use it for something external to my game but I want to suck up as much cpu processing power as possible) I believe that setting the main thread on one core could thus be a performance issu since the main thread is responsible for quite a lot of tasks besides timing. - I can either have a particular thread dedicated to timing, and pay the syncing + context switching overheads - I can leave the timing calls in the main thread and pay the cost of receiving less processor time. Any thoughts about what theorically sounds best ? Or even better, have you experienced this? Best regards, Janta
Advertisement
1) You're assuming that an OS with a very busy C1 and an idle C2 will choose C1 to run T2? In my experience, the OS is usually not that stupid.

2) It is difficult to get 100% utilisation from both CPUs. Obviously one thread can only use 50% of the processing power available (1 CPU), and the only way to use the other CPU effectively as well is to have another thread in your program which is also very processor intensive. Chances are, you wont be driving both CPUs to the maximum anyway. Other parts of the system such as the RAM will create more of a bottleneck than it would on a single core system.

Do you already have a specific performance problem as such, or are you just theorising about how to get the most power out of a dual-core system?
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
If one core is idle, the OS will choose that core for the next thread that needs to run.

If you're really worried, then you could set the main thread to run on core 1, and all other threads to run on all other cores besides 1 (assuming there are more than one :-)
enum Bool { True, False, FileNotFound };
So you're telling me that the OS will take into account that C1 can only run T1 and schedules everything accordingly in a clever way. That is, when moment comes up of chosing what core runs T1 and what core runs T2, C1 will be preferred to C2 for running T1 because the OS is aware of the affinies that were set. (I hope those figures are not too confusing)

From MSDN:
Quote:Setting an affinity mask for a process or thread can result in threads receiving less processor time, as the system is restricted from running the threads on certain processors. In most cases, it is better to let the system select an available processor

So there would really be a performance hit. Minor or important is probably specific to every particular real-life situation, of course.

I guess another problem is I have an over simplistic view of all this and maybe I should trust the OS until I get real issues, for I'm only theorising right now :) However, since its mostly a design issue and that I must design before I code, I thought that I'd get good insight from people here to help me make a clever choice rather than a random choice.
Quote:Original post by hplus0603
If one core is idle, the OS will choose that core for the next thread that needs to run.

If I understand corectly, if C1 is idle and T2 needs to run, then C1 will run C2. And if a nanosecond later T1 needs to run, I will have to wait till C1 is available again, thus wasting time.

Quote:If you're really worried, then you could set the main thread to run on core 1, and all other threads to run on all other cores besides 1 (assuming there are more than one :-)

Sounds interesting, its kinda like making a bijection (not sure that word is really english, probably franglish, so I hope you understand...) Cores <--> Threads. Wouldnt that be perturbating the OS internals, I mean, isn't the OS supposed to manage this on his own more effificiently than a programmer ?

I wonder... :)
Quote:Original post by janta
I believe that setting the main thread on one core could thus be a performance issu since the main thread is responsible for quite a lot of tasks besides timing.



Quote:
Use SetThreadAffinityMask with caution!
-May be useful for assigning "heavy" work threads
-This mask is technically a hint, not a commitment
-RDTSC-based instrumenting will require locking the game threads to a single core
-Otherwise let the Windows scheduler do the right thing
-CreateDevice/Reset might have a side-effect on the calling thread's affinity with software vertex processing enabled


This is an extract of a presentation at GDC2006 Coding for multiple cores.

The thing to remember is to let Windows scheduler do the right thing and don't force it to use one of the core for one thread but when you what to use timing you will need to lock the thread to one core. And also, they said that the affinity mask is a hint and not a commitment you're not sure what the OS might do in the end but it will try its best to run it on the core you selected.

JFF
Quote:This mask is technically a hint, not a commitment

What does that mean ?
Quote:CreateDevice/Reset might have a side-effect on the calling thread's affinity with software vertex processing enabled

Does it mean it is safer to call SetThreadAffinityMask again after a Reset or a device creation ?

Thanks alot for that nice link.

I presume you have seen SetThreadIdealProcessor?
Quote:Original post by janta
Quote:This mask is technically a hint, not a commitment

What does that mean ?


From what I understood it means that you can't be 100% assured that this will run on the selected CPU. So even if you set it to use C1 fore example it might get some running time on C2 if the OS can't schedule it on C1.


Quote:
Quote:CreateDevice/Reset might have a side-effect on the calling thread's affinity with software vertex processing enabled

Does it mean it is safer to call SetThreadAffinityMask again after a Reset or a device creation ?


Probably... I'm not sure on this one and I don't have a multicore or multi cpu machine around to test. Maybe CreateDevice and Reset call or reset the thread affinity.


JFF
From what I've read, bad issues can arise if QPF/QPC are executed from different CPUs/cores. So right now I feel a bit confused when you tell me that SetThreadAffinityMask cant 100% enforce that.
The fact is I dont have a dual core machine either and I wont be able to test the behaviour of my program till I get a dual core... But I still want to design things to take advantage of multicore cpu (or at least I want to try)
I guess I'll use SetThreadAffinityMask for now as it seems to be the most generally advised choice.

Thanks string for the link, I did not know that function.

This topic is closed to new replies.

Advertisement