Bind single-threaded apps to a single core
#1 Members - Reputation: 102
Posted 16 February 2012 - 10:01 AM
Edit: As I said it's an old game, so extreme performance is not an issue at all. Cache misses due to core swapping are hardly a concern.
#2 Members - Reputation: 2303
Posted 16 February 2012 - 10:11 AM
Quote
Timing is the most common source of bugs when swapping cores. There could be others, more obscure ones.
Issues might also be related to third-party libraries or worse, the drivers.
Quote
You could always fix the code.... Which isn't really viable since such bugs are hard to reproduce.
Or just make a batch file "start /affinity 1 foo.exe".
#3 Members - Reputation: 102
Posted 17 February 2012 - 03:51 AM
Quote
Quote
Quote
#4 Senior Moderators - Reputation: 2448
Posted 17 February 2012 - 03:59 AM
ScapeCode - Blog | SlimDX
#6 Members - Reputation: 2303
Posted 17 February 2012 - 06:45 AM
Quote
Then it's anyone's guess on why it works or not and what threads have to do with it.
Time will usually be read once per frame, then that value gets passed around. Reading time many times per frame is conceptually difficult, since simulation clock and wall clock aren't synchronized. For various reasons, different stages of frame may take different time, especially considering new hardware has different characteristics. While everything is faster, relative time per operations differs.
It's even worse if frame tick goes through several stages, perhaps advancing simulation five times. On faster hardware and due to QPC quirks, these stages could easily end up taking 0 time, causing all kinds of mess.
Fixing code for simple maintenance job likely isn't viable, but timing really is a mess.
Hopefully, floats aren't involved...
#7 Members - Reputation: 186
Posted 18 February 2012 - 08:13 PM
Is it safe to conclude that QPC is fixed on modern (all quad core and above) platforms?
#8 Members - Reputation: 101
Posted 19 February 2012 - 11:55 AM
taz0010, on 18 February 2012 - 08:13 PM, said:
Is it safe to conclude that QPC is fixed on modern (all quad core and above) platforms?
After spending a good week straight of scouring google, msdn, gamedev, stackoverflow, and pretty much anything else I could find about QPC, I'm pretty sure its fixed on any modern platform. From what I understand it was only a few isolated platforms, and there were bios/OS patches released for most of them. The general consensus was to just ignore the issue. Unfortunetly there is no single definitive article explaining it all in detail, and the MSDN docs are just terrible (which is surprising given it is such a key issue). I wish Microsoft would just list the platforms that had issues somewhere, it seemed to be far fewer than what the docs made it sound like. There's also an easy work-around for those systems that do have errors.
In a nut-shell, the problem is older systems didn't have the HPET and QPC would instead use the Time Stamp Counter (TSC) which of course wasn't sychronized between the cores. Turns our there is another timer on most systems called the PM timer, which is pretty much just the HPET but a little slower. Why Windows decided to use TSC over the PMTimer for QPC I'm not entirely sure, but its easy to force it (explained here: http://blogs.technet...e-boot-ini.aspx).
Was there a system out there that had dual cores and didn't have the PMTimer either? I have no idea, nor did I see this possibility ever mentioned. In the end my conclusion was that the QPC issues are a thing of the past, and easy to rectify if in the rare chance that someone was still using one of the old dual core systems that hadn't updated their OS.
#9 Members - Reputation: 235
Posted 19 February 2012 - 03:09 PM
King_DuckZ, on 16 February 2012 - 10:01 AM, said:
King_DuckZ, on 17 February 2012 - 03:51 AM, said:
Quote
Ryan_001, on 19 February 2012 - 11:55 AM, said:
#10 Members - Reputation: 1216
Posted 19 February 2012 - 05:45 PM
Sik_the_hedgehog, on 19 February 2012 - 03:09 PM, said:
King_DuckZ, on 16 February 2012 - 10:01 AM, said:
#11 Senior Moderators - Reputation: 1617
Posted 19 February 2012 - 05:56 PM
Cornstalks, on 19 February 2012 - 05:45 PM, said:
#12 Members - Reputation: 235
Posted 19 February 2012 - 07:35 PM
swiftcoder, on 19 February 2012 - 05:56 PM, said:
Cornstalks, on 19 February 2012 - 05:45 PM, said:
Also, two cores trying to access the same area of RAM at the same time. The problem is that modern processors can change the order in which memory writes are done to improve memory access performance, and if the code relied on things being written in a specific order this can cause bugs (e.g. a "ready" flag getting set before the full data for the new state is actually written). For a single core this isn't a problem since the core already knows what data should be there anyways, but other cores may not at the moment they try to read from those positions and they will get old information instead.
Proper handling of shared variables (including proper use of mutexes, etc.) should make all this a non-issue, but with old code you can never know what the original programmer could have attempted to do.
#13 Members - Reputation: 690
Posted 20 February 2012 - 04:43 AM
In my experience, thread affinity has no other effect than greatly increasing CPU load and greatly reducing performance at the same time. If you think about it, that's kind of logical too, and not very surprising.
There are dedicated programs that take a process snapshot every second and revert thread affinity back to "all Cores". I've used such a program for years on two titles that I play in my free time. Both are multithreaded, both use OpenAL for sound and OpenGL for graphics. On both titles, load time with added stupidity is about 6 times as long as it is with reset-to-default affinity (on a 4-core machine, go Amdahl[1]). On both titles, average CPU load while playing as-is is around 30% (one core used 100% by the game, and some odd 5% for everything else on the computer) compared to 2-3% with reverted affinity. Also, both titles, after reverting affinity, merely have around 2,000 context switches per second rather than 50,000-60,000.
Now of course there are some rare cases where CPU affinity is needed and some reasons why it may be an advantage. One example where it is needed is some Windows tasks (at least under Windows XP, later versions might not need this). Some drivers might need interrupt affinity (which is not quite the same thing but goes into the same direction), too.
There is the well-known QueryPerformanceCounter issue, but realistically it is non-existent nowadays. Even so, assuming it is still an issue, a better solution would be to have a single dedicated timer thread that binds to one CPU and updates a global variable atomically, rather than limiting the entire process to a single CPU.
There is the theoretical consideration of NUMA, but frankly this is something that would be best left to the operating system to decide. Binding blindly to Core0 as seems to be the "common recipe" with thread affinity will do more harm than good in this case too (unless by sheer coincidence that is the correct CPU). Also, how many people use Itanium to play a desktop game.
When doing some extreme number crunching over huge sets of data, you might possibly get better cache behaviour by having exactly one thread per core and binding each to a particular core. It remains to be shown whether that this is really the case. In my experience, leaving the decision where to schedule a thread to the OS works just fine and gets you close to 100% resource usage.
About data consistency, if a programmer writes improperly threaded code, then thread affinity will not save him either. Nothing prevents the operating system from pre-empting a thread e.g. in the middle of a (non-atomic-insn) read-modify-write operation.
Setting affinity also adds the possibility of getting funny effects, such as one thread waiting on a lock held by another, which has to be swapped out for the former to release the lock. Spinlocks in particular can be a lot of fun with affinity.
[1] I have to admit that this beats me, but it's something I've really seen. It must be some funny combination of keeping the hard disk going, unpacking data, and initializing sound/graphics hardware, all without stalling in between. Or... whatever. In any case, the observed single-core performance is much worse than just 1/num_core.


















