Jump to content



Bind single-threaded apps to a single core

  • You cannot reply to this topic
12 replies to this topic

#1 King_DuckZ   Members   -  Reputation: 102

Like
0Likes
Like

Posted 16 February 2012 - 10:01 AM

Hello guys, I'm working on an old game that needs to be ported to modern windows systems. The game is single-threaded, and one of the bugs they assigned to me is to bind the main thread to a fixed core on multi-core cpus. While this should be easy enough, I was wondering if there would really be a good reason for that, and how it may affect the performance and the os. Should I really do that?

Edit: As I said it's an old game, so extreme performance is not an issue at all. Cache misses due to core swapping are hardly a concern.
[ King_DuckZ out-- ]

Ad:

#2 Antheus   Members   -  Reputation: 2303

Like
0Likes
Like

Posted 16 February 2012 - 10:11 AM

Quote

I was wondering if there would really be a good reason

Timing is the most common source of bugs when swapping cores. There could be others, more obscure ones.

Issues might also be related to third-party libraries or worse, the drivers.

Quote

Should I really do that?

You could always fix the code.... Which isn't really viable since such bugs are hard to reproduce.

Or just make a batch file "start /affinity 1 foo.exe".

#3 King_DuckZ   Members   -  Reputation: 102

Like
0Likes
Like

Posted 17 February 2012 - 03:51 AM

Quote

Timing is the most common source of bugs when swapping cores. There could be others, more obscure ones.
Indeed, I didn't think of QueryPerformanceCounter(). The timer implementation is using it, and being an old code for single core it's not even trying to make up for the possible discrepancies.

Quote

You could always fix the code.... Which isn't really viable since such bugs are hard to reproduce.
Also timing code isn't wrapped into only one or two classes... the original programmer had a vocation for copy & paste, so there's hundreds of direct calls with the subsequent transformation into milliseconds. Sometimes you even see the same comments over and over in the code.

Quote

Or just make a batch file "start /affinity 1 foo.exe".
Tbh I was rather thinking to use SetThreadAffinityMask(GetCurrentThread(), 1) in the WinMain. Are the two solutions equivalent or are there arguments in favour of one rather than the other?
[ King_DuckZ out-- ]

#4 Washu   Senior Moderators   -  Reputation: 2448

Like
0Likes
Like

Posted 17 February 2012 - 03:59 AM

No, they do the same thing.
In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.
ScapeCode - Blog | SlimDX

#5 King_DuckZ   Members   -  Reputation: 102

Like
0Likes
Like

Posted 17 February 2012 - 05:18 AM

Great, thanks! :)
[ King_DuckZ out-- ]

#6 Antheus   Members   -  Reputation: 2303

Like
0Likes
Like

Posted 17 February 2012 - 06:45 AM

Quote

Also timing code isn't wrapped into only one or two classes... the original programmer had a vocation for copy & paste, so there's hundreds of direct calls with the subsequent transformation into milliseconds.

Then it's anyone's guess on why it works or not and what threads have to do with it.

Time will usually be read once per frame, then that value gets passed around. Reading time many times per frame is conceptually difficult, since simulation clock and wall clock aren't synchronized. For various reasons, different stages of frame may take different time, especially considering new hardware has different characteristics. While everything is faster, relative time per operations differs.

It's even worse if frame tick goes through several stages, perhaps advancing simulation five times. On faster hardware and due to QPC quirks, these stages could easily end up taking 0 time, causing all kinds of mess.

Fixing code for simple maintenance job likely isn't viable, but timing really is a mess.

Hopefully, floats aren't involved...

#7 taz0010   Members   -  Reputation: 186

Like
0Likes
Like

Posted 18 February 2012 - 08:13 PM

What's the preferred method of dealing with the "QPC on old dual core machines" problem? Do developers simply ignore the issue? I was one of the unlucky people to experience the bug and it was definitiely present in several big name titles I played.

Is it safe to conclude that QPC is fixed on modern (all quad core and above) platforms?

#8 Ryan_001   Members   -  Reputation: 101

Like
0Likes
Like

Posted 19 February 2012 - 11:55 AM

View Posttaz0010, on 18 February 2012 - 08:13 PM, said:

What's the preferred method of dealing with the "QPC on old dual core machines" problem? Do developers simply ignore the issue? I was one of the unlucky people to experience the bug and it was definitiely present in several big name titles I played.

Is it safe to conclude that QPC is fixed on modern (all quad core and above) platforms?

After spending a good week straight of scouring google, msdn, gamedev, stackoverflow, and pretty much anything else I could find about QPC, I'm pretty sure its fixed on any modern platform. From what I understand it was only a few isolated platforms, and there were bios/OS patches released for most of them. The general consensus was to just ignore the issue. Unfortunetly there is no single definitive article explaining it all in detail, and the MSDN docs are just terrible (which is surprising given it is such a key issue). I wish Microsoft would just list the platforms that had issues somewhere, it seemed to be far fewer than what the docs made it sound like. There's also an easy work-around for those systems that do have errors.

In a nut-shell, the problem is older systems didn't have the HPET and QPC would instead use the Time Stamp Counter (TSC) which of course wasn't sychronized between the cores. Turns our there is another timer on most systems called the PM timer, which is pretty much just the HPET but a little slower. Why Windows decided to use TSC over the PMTimer for QPC I'm not entirely sure, but its easy to force it (explained here: http://blogs.technet...e-boot-ini.aspx).

Was there a system out there that had dual cores and didn't have the PMTimer either? I have no idea, nor did I see this possibility ever mentioned. In the end my conclusion was that the QPC issues are a thing of the past, and easy to rectify if in the rare chance that someone was still using one of the old dual core systems that hadn't updated their OS.

#9 Sik_the_hedgehog   Members   -  Reputation: 235

Like
0Likes
Like

Posted 19 February 2012 - 03:09 PM

A bit late since I just see this thread.

View PostKing_DuckZ, on 16 February 2012 - 10:01 AM, said:

Hello guys, I'm working on an old game that needs to be ported to modern windows systems. The game is single-threaded, and one of the bugs they assigned to me is to bind the main thread to a fixed core on multi-core cpus. While this should be easy enough, I was wondering if there would really be a good reason for that, and how it may affect the performance and the os. Should I really do that?
Timing issues as mentioned in this thread are one reason, another reason could be some potential race conditions that would never arise on a single core (since only one thread is running at a time) but could possibly become an issue on multiple cores (since multiple threads are running and accessing hardware resources at the same time).

View PostKing_DuckZ, on 17 February 2012 - 03:51 AM, said:

Quote

Or just make a batch file "start /affinity 1 foo.exe".
Tbh I was rather thinking to use SetThreadAffinityMask(GetCurrentThread(), 1) in the WinMain. Are the two solutions equivalent or are there arguments in favour of one rather than the other?
Well, your idea would work always even if the user runs the executable directly instead of using the batch file =P

View PostRyan_001, on 19 February 2012 - 11:55 AM, said:

Was there a system out there that had dual cores and didn't have the PMTimer either? I have no idea, nor did I see this possibility ever mentioned.
No idea, but there could have been systems where using the PM timer caused bugs though. Looking up, the PM timer seems to be related to power management, so maybe using the PM timer on those systems can actually mess up the ability to do proper power management? I can see why Microsoft would want to avoid that.
Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

#10 Cornstalks   Members   -  Reputation: 1216

Like
0Likes
Like

Posted 19 February 2012 - 05:45 PM

View PostSik_the_hedgehog, on 19 February 2012 - 03:09 PM, said:

View PostKing_DuckZ, on 16 February 2012 - 10:01 AM, said:

Hello guys, I'm working on an old game that needs to be ported to modern windows systems. The game is single-threaded, and one of the bugs they assigned to me is to bind the main thread to a fixed core on multi-core cpus. While this should be easy enough, I was wondering if there would really be a good reason for that, and how it may affect the performance and the os. Should I really do that?
Timing issues as mentioned in this thread are one reason, another reason could be some potential race conditions that would never arise on a single core (since only one thread is running at a time) but could possibly become an issue on multiple cores (since multiple threads are running and accessing hardware resources at the same time).
Care to expand on what race conditions could arise in a single threaded application running on a multi-core system?
[ Realistic Rendering ] [ School + Dublin = Boom ] [ I've been ninja'd 70 times ] [ f.k.a. MikeTacular ] [ My Blog ]

#11 swiftcoder   Senior Moderators   -  Reputation: 1617

Like
0Likes
Like

Posted 19 February 2012 - 05:56 PM

View PostCornstalks, on 19 February 2012 - 05:45 PM, said:

Care to expand on what race conditions could arise in a single threaded application running on a multi-core system?
It's pretty darn hard on a modern system, although interaction with external hardware devices is a possibility. However, I assume Sik_the_hedgehog was discussing the possibility for a multi-threaded application.
Tristam MacDonald - swiftcoding [new blog post: bidding a freelance contract]

#12 Sik_the_hedgehog   Members   -  Reputation: 235

Like
0Likes
Like

Posted 19 February 2012 - 07:35 PM

View Postswiftcoder, on 19 February 2012 - 05:56 PM, said:

View PostCornstalks, on 19 February 2012 - 05:45 PM, said:

Care to expand on what race conditions could arise in a single threaded application running on a multi-core system?
It's pretty darn hard on a modern system, although interaction with external hardware devices is a possibility. However, I assume Sik_the_hedgehog was discussing the possibility for a multi-threaded application.
Yeah, pretty much this, not to mention that any APIs the program may be using could be multithreaded behind the scenes (e.g. for timers or sound playback). Note that by "issues" it doesn't have to be erroneous behavior, it could also just be excessive performance loss and such.

Also, two cores trying to access the same area of RAM at the same time. The problem is that modern processors can change the order in which memory writes are done to improve memory access performance, and if the code relied on things being written in a specific order this can cause bugs (e.g. a "ready" flag getting set before the full data for the new state is actually written). For a single core this isn't a problem since the core already knows what data should be there anyways, but other cores may not at the moment they try to read from those positions and they will get old information instead.

Proper handling of shared variables (including proper use of mutexes, etc.) should make all this a non-issue, but with old code you can never know what the original programmer could have attempted to do.
Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

#13 samoth   Members   -  Reputation: 690

Like
1Likes
Like

Posted 20 February 2012 - 04:43 AM

Whenever I encounter a program that binds affinity to a single core, I feel urged to seek out the programmer and tell him to take off his glasses (so I can hit him harder without hurting my hand). Posted Image

In my experience, thread affinity has no other effect than greatly increasing CPU load and greatly reducing performance at the same time. If you think about it, that's kind of logical too, and not very surprising.

There are dedicated programs that take a process snapshot every second and revert thread affinity back to "all Cores". I've used such a program for years on two titles that I play in my free time. Both are multithreaded, both use OpenAL for sound and OpenGL for graphics. On both titles, load time with added stupidity is about 6 times as long as it is with reset-to-default affinity (on a 4-core machine, go Amdahl[1]). On both titles, average CPU load while playing as-is is around 30% (one core used 100% by the game, and some odd 5% for everything else on the computer) compared to 2-3% with reverted affinity. Also, both titles, after reverting affinity, merely have around 2,000 context switches per second rather than 50,000-60,000.

Now of course there are some rare cases where CPU affinity is needed and some reasons why it may be an advantage. One example where it is needed is some Windows tasks (at least under Windows XP, later versions might not need this). Some drivers might need interrupt affinity (which is not quite the same thing but goes into the same direction), too.

There is the well-known QueryPerformanceCounter issue, but realistically it is non-existent nowadays. Even so, assuming it is still an issue, a better solution would be to have a single dedicated timer thread that binds to one CPU and updates a global variable atomically, rather than limiting the entire process to a single CPU.

There is the theoretical consideration of NUMA, but frankly this is something that would be best left to the operating system to decide. Binding blindly to Core0 as seems to be the "common recipe" with thread affinity will do more harm than good in this case too (unless by sheer coincidence that is the correct CPU). Also, how many people use Itanium to play a desktop game.

When doing some extreme number crunching over huge sets of data, you might possibly get better cache behaviour by having exactly one thread per core and binding each to a particular core. It remains to be shown whether that this is really the case. In my experience, leaving the decision where to schedule a thread to the OS works just fine and gets you close to 100% resource usage.

About data consistency, if a programmer writes improperly threaded code, then thread affinity will not save him either. Nothing prevents the operating system from pre-empting a thread e.g. in the middle of a (non-atomic-insn) read-modify-write operation.
Setting affinity also adds the possibility of getting funny effects, such as one thread waiting on a lock held by another, which has to be swapped out for the former to release the lock. Spinlocks in particular can be a lot of fun with affinity.



[1] I have to admit that this beats me, but it's something I've really seen. It must be some funny combination of keeping the hard disk going, unpacking data, and initializing sound/graphics hardware, all without stalling in between. Or... whatever. In any case, the observed single-core performance is much worse than just 1/num_core.






We are working on generating results for this topic
PARTNERS