Sign in to follow this  

Bind single-threaded apps to a single core

This topic is 2122 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello guys, I'm working on an old game that needs to be ported to modern windows systems. The game is single-threaded, and one of the bugs they assigned to me is to bind the main thread to a fixed core on multi-core cpus. While this should be easy enough, I was wondering if there would really be a good reason for that, and how it may affect the performance and the os. Should I really do that?

[i]Edit:[/i] As I said it's an old game, so extreme performance is not an issue at all. Cache misses due to core swapping are hardly a concern.

Share this post


Link to post
Share on other sites
[quote]I was wondering if there would really be a good reason[/quote]

Timing is the most common source of bugs when swapping cores. There could be others, more obscure ones.

Issues might also be related to third-party libraries or worse, the drivers.

[quote]Should I really do that?[/quote]

You could always fix the code.... Which isn't really viable since such bugs are hard to reproduce.

Or just make a batch file "start /affinity 1 foo.exe".

Share this post


Link to post
Share on other sites
[quote]
Timing is the most common source of bugs when swapping cores. There could be others, more obscure ones.
[/quote]
Indeed, I didn't think of QueryPerformanceCounter(). The timer implementation is using it, and being an old code for single core it's not even trying to make up for the possible discrepancies.

[quote]
You could always fix the code.... Which isn't really viable since such bugs are hard to reproduce.
[/quote]
Also timing code isn't wrapped into only one or two classes... the original programmer had a vocation for copy & paste, so there's hundreds of direct calls with the subsequent transformation into milliseconds. Sometimes you even see the same comments over and over in the code.

[quote]
Or just make a batch file "start /affinity 1 foo.exe".
[/quote]
Tbh I was rather thinking to use SetThreadAffinityMask(GetCurrentThread(), 1) in the WinMain. Are the two solutions equivalent or are there arguments in favour of one rather than the other?

Share this post


Link to post
Share on other sites
[quote]Also timing code isn't wrapped into only one or two classes... the original programmer had a vocation for copy & paste, so there's hundreds of direct calls with the subsequent transformation into milliseconds.[/quote]

Then it's anyone's guess on why it works or not and what threads have to do with it.

Time will usually be read once per frame, then that value gets passed around. Reading time many times per frame is conceptually difficult, since simulation clock and wall clock aren't synchronized. For various reasons, different stages of frame may take different time, especially considering new hardware has different characteristics. While everything is faster, relative time per operations differs.

It's even worse if frame tick goes through several stages, perhaps advancing simulation five times. On faster hardware and due to QPC quirks, these stages could easily end up taking 0 time, causing all kinds of mess.

Fixing code for simple maintenance job likely isn't viable, but timing really is a mess.

Hopefully, floats aren't involved...

Share this post


Link to post
Share on other sites
What's the preferred method of dealing with the "QPC on old dual core machines" problem? Do developers simply ignore the issue? I was one of the unlucky people to experience the bug and it was definitiely present in several big name titles I played.

Is it safe to conclude that QPC is fixed on modern (all quad core and above) platforms?

Share this post


Link to post
Share on other sites
[quote name='taz0010' timestamp='1329617606' post='4914397']
What's the preferred method of dealing with the "QPC on old dual core machines" problem? Do developers simply ignore the issue? I was one of the unlucky people to experience the bug and it was definitiely present in several big name titles I played.

Is it safe to conclude that QPC is fixed on modern (all quad core and above) platforms?
[/quote]

After spending a good week straight of scouring google, msdn, gamedev, stackoverflow, and pretty much anything else I could find about QPC, I'm pretty sure its fixed on any modern platform. From what I understand it was only a few isolated platforms, and there were bios/OS patches released for most of them. The general consensus was to just ignore the issue. Unfortunetly there is no single definitive article explaining it all in detail, and the MSDN docs are just terrible (which is surprising given it is such a key issue). I wish Microsoft would just list the platforms that had issues somewhere, it seemed to be far fewer than what the docs made it sound like. There's also an easy work-around for those systems that do have errors.

In a nut-shell, the problem is older systems didn't have the [url="http://en.wikipedia.org/wiki/High_Precision_Event_Timer"]HPET[/url] and QPC would instead use the Time Stamp Counter (TSC) which of course wasn't sychronized between the cores. Turns our there is another timer on most systems called the PM timer, which is pretty much just the HPET but a little slower. Why Windows decided to use TSC over the PMTimer for QPC I'm not entirely sure, but its easy to force it (explained here: [url="http://blogs.technet.com/b/perfguru/archive/2008/02/18/explanation-for-the-usepmtimer-switch-in-the-boot-ini.aspx"]http://blogs.technet.com/b/perfguru/archive/2008/02/18/explanation-for-the-usepmtimer-switch-in-the-boot-ini.aspx[/url]).

Was there a system out there that had dual cores and didn't have the PMTimer either? I have no idea, nor did I see this possibility ever mentioned. In the end my conclusion was that the QPC issues are a thing of the past, and easy to rectify if in the rare chance that someone was still using one of the old dual core systems that hadn't updated their OS.

Share this post


Link to post
Share on other sites
A bit late since I just see this thread.

[quote name='King_DuckZ' timestamp='1329408096' post='4913681']Hello guys, I'm working on an old game that needs to be ported to modern windows systems. The game is single-threaded, and one of the bugs they assigned to me is to bind the main thread to a fixed core on multi-core cpus. While this should be easy enough, I was wondering if there would really be a good reason for that, and how it may affect the performance and the os. Should I really do that?[/quote]
Timing issues as mentioned in this thread are one reason, another reason could be some potential race conditions that would never arise on a single core (since only one thread is running at a time) but could possibly become an issue on multiple cores (since multiple threads are running and accessing hardware resources at the same time).

[quote name='King_DuckZ' timestamp='1329472265' post='4913843'][quote]
Or just make a batch file "start /affinity 1 foo.exe".
[/quote]
Tbh I was rather thinking to use SetThreadAffinityMask(GetCurrentThread(), 1) in the WinMain. Are the two solutions equivalent or are there arguments in favour of one rather than the other?[/quote]
Well, your idea would work always even if the user runs the executable directly instead of using the batch file =P

[quote name='Ryan_001' timestamp='1329674158' post='4914565']Was there a system out there that had dual cores and didn't have the PMTimer either? I have no idea, nor did I see this possibility ever mentioned.[/quote]
No idea, but there could have been systems where using the PM timer caused bugs though. Looking up, the PM timer seems to be related to power management, so maybe using the PM timer on those systems can actually mess up the ability to do proper power management? I can see why Microsoft would want to avoid that.

Share this post


Link to post
Share on other sites
[quote name='Sik_the_hedgehog' timestamp='1329685750' post='4914612']
[quote name='King_DuckZ' timestamp='1329408096' post='4913681']Hello guys, I'm working on an old game that needs to be ported to modern windows systems. The game is single-threaded, and one of the bugs they assigned to me is to bind the main thread to a fixed core on multi-core cpus. While this should be easy enough, I was wondering if there would really be a good reason for that, and how it may affect the performance and the os. Should I really do that?[/quote]
Timing issues as mentioned in this thread are one reason, another reason could be some potential race conditions that would never arise on a single core (since only one thread is running at a time) but could possibly become an issue on multiple cores (since multiple threads are running and accessing hardware resources at the same time).
[/quote]
Care to expand on what race conditions could arise in a single threaded application running on a multi-core system?

Share this post


Link to post
Share on other sites
[quote name='Cornstalks' timestamp='1329695156' post='4914660']
Care to expand on what race conditions could arise in a single threaded application running on a multi-core system?[/quote]
It's pretty darn hard on a modern system, although interaction with external hardware devices is a possibility. However, I assume Sik_the_hedgehog was discussing the possibility for a multi-threaded application.

Share this post


Link to post
Share on other sites
[quote name='swiftcoder' timestamp='1329695816' post='4914664']
[quote name='Cornstalks' timestamp='1329695156' post='4914660']
Care to expand on what race conditions could arise in a single threaded application running on a multi-core system?[/quote]
It's pretty darn hard on a modern system, although interaction with external hardware devices is a possibility. However, I assume Sik_the_hedgehog was discussing the possibility for a multi-threaded application.
[/quote]
Yeah, pretty much this, not to mention that any APIs the program may be using could be multithreaded behind the scenes (e.g. for timers or sound playback). Note that by "issues" it doesn't have to be erroneous behavior, it could also just be excessive performance loss and such.

Also, two cores trying to access the same area of RAM at the same time. The problem is that modern processors can change the order in which memory writes are done to improve memory access performance, and if the code relied on things being written in a specific order this can cause bugs (e.g. a "ready" flag getting set before the full data for the new state is actually written). For a single core this isn't a problem since the core already knows what data should be there anyways, but other cores may not at the moment they try to read from those positions and they will get old information instead.

Proper handling of shared variables (including proper use of mutexes, etc.) should make all this a non-issue, but with old code you can never know what the original programmer could have attempted to do.

Share this post


Link to post
Share on other sites
Whenever I encounter a program that binds affinity to a single core, I feel urged to seek out the programmer and tell him to take off his glasses (so I can hit him harder without hurting my hand). [img]http://public.gamedev.net//public/style_emoticons/default/laugh.png[/img]

In my experience, thread affinity has no other effect than greatly [i]increasing [/i]CPU load and greatly reducing performance at the same time. If you think about it, that's kind of logical too, and not very surprising.

There are dedicated programs that take a process snapshot every second and revert thread affinity back to "all Cores". I've used such a program [u]for years[/u] on two titles that I play in my free time. Both are multithreaded, both use OpenAL for sound and OpenGL for graphics. On both titles, load time with added stupidity is about 6 times as long as it is with reset-to-default affinity (on a 4-core machine, go Amdahl[sup][1][/sup]). On both titles, average CPU load while playing [i]as-is [/i]is around 30% (one core used 100% by the game, and some odd 5% for everything else on the computer) compared to 2-3% with reverted affinity. Also, both titles, after reverting affinity, merely have around 2,000 context switches per second rather than 50,000-60,000.

Now of course there are some rare cases where CPU affinity [i]is needed [/i]and some reasons why it may be an advantage. One example where it is needed is some Windows tasks (at least under Windows XP, later versions might not need this). Some drivers might need interrupt affinity (which is not quite the same thing but goes into the same direction), too.

There is the well-known QueryPerformanceCounter issue, but realistically it is non-existent nowadays. Even so, [i]assuming it is still an issue[/i], a better solution would be to have a single dedicated timer thread that binds to one CPU and updates a global variable atomically, rather than limiting the entire process to a single CPU.

There is the theoretical consideration of NUMA, but frankly this is something that would be best left to the operating system to decide. Binding blindly to Core0 as seems to be the "common recipe" with thread affinity will do more harm than good in this case too (unless by sheer coincidence that is the correct CPU). Also, how many people use Itanium to play a desktop game.

When doing some extreme number crunching over huge sets of data, you [i]might possibly [/i]get better cache behaviour by having exactly one thread per core and binding each to a particular core. It remains to be shown whether that this is really the case. In my experience, leaving the decision where to schedule a thread to the OS works just fine and gets you close to 100% resource usage.

About data consistency, if a programmer writes improperly threaded code, then thread affinity will not save him either. Nothing prevents the operating system from pre-empting a thread e.g. in the middle of a (non-atomic-insn) read-modify-write operation.
Setting affinity also adds the possibility of getting funny effects, such as one thread waiting on a lock held by another, which has to be swapped out for the former to release the lock. Spinlocks in particular can be a lot of fun with affinity.


[u] [/u]
[size=3][sup][1][/sup] I have to admit that this beats me, but it's something I've really seen. It must be some funny combination of keeping the hard disk going, unpacking data, and initializing sound/graphics hardware, all without stalling in between. Or... whatever. In any case, the observed single-core performance is [i]much [/i]worse than just 1/num_core.[/size]

Share this post


Link to post
Share on other sites

This topic is 2122 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this