QueryPerformanceCounter and threading

Started by
6 comments, last by Martin 13 years, 8 months ago
Hi,

It's been some time since I've done any programming in PC land. Now, my problem is that I have a multi-threaded engine running on PC, any thread may request to read the performance counter at any time.

using RDTSC is clearly a no go, it counts CPU cycles on the host core, variable rate CPU frequencies are problematic is a thread switching onto a different core.
Clock functions are way too low resolution
QueryPerformanceCounter is only ok as long as all calls are made from the same thread.

The idea is then to kick up a thread just sits there waiting to be asked to make a call to QueryPerformanceCounter, it then uses a 64 bit atomic exchange instruction to write the result back to the calling thread. Easy enough, the issue is how to do this fast without the timing thread taking up a lot of CPU resources running all the time. Waiting the thread on an event would work but it could take considerable time to wake up -> the event signal would be slow -> the timing would be inaccurate. (It is used in an inbuilt profiler as well as for other operations)

Has anyone seen any good articles on this subject / have any experience / thoughts to share before I go off reinventing the wheel.

Many thanks in advance,
Martin
Cheers,MartinIf I've helped you, a rating++ would be appreciated
Advertisement
Firstly note that QueryPerformanceCounter() usually is reliable across multiple CPU cores - it's just that on some hardware it isn't. For profiling you can usually get away with just using it as you should be able to avoid the need to do any profiling on PCs where it's buggy (that is assuming your dev PC isn't one of the buggy ones).

For my game timing needs I've found reading the time once per frame is enough. Apart from profiling why do you need sub millisecond timing accuracy across multiple threads?

If you don't then timeBeginPeriod(1) combined with timeGetTime() should be accurate enough - it'll give you a reliable 1ms timer.

If you do really need reliable high precision timing across multiple threads then the only option I know of is to sacrifice a single CPU core to running a loop that's just reading the time and storing it in a global variable which the other threads can then read. This is obviously only really useful if your CPU has more than two cores.
For frame-timing purposes, just lock down your main thread to a particular core (with thread affinity), and do the per-frame timing on that thread.

If you're doing some kind of profiling or something, where each thread needs to know sub-frame times, then
- as above, hope you don't have one of these buggy CPU/motherboard combinations.
- lock each thread to a particular core, and treat absolute time values as being thread-private (relative timings can still be shared).
Quote:Original post by Martin
It's been some time since I've done any programming in PC land. Now, my problem is that I have a multi-threaded engine running on PC, any thread may request to read the performance counter at any time.

There is another thread going up in flames over pros and cons of OOP. And while only tangentially related, it exposes the issue of encapsulation at design level. Some things cannot be encapsulated.

Why must every thread query the timer at any time? Especially when time is not local to thread nor does its value change conceptually over time, but is advanced when each work unit is complete.

Simulations progress via time step. There is a main loop while measures time. When a new tick is required, it issues new tasks to be processed by workers, and those work off same value.

Threads are simply not the driving force of the simulation - wall clock is. So it works something like this:
while (running) {  if (currentTime - lastTime > update_rate) {    lastTime += updateRate;    send_new_work_to_threads(lastTime);  }  render(lastTime, currentTime);}
This solves all problems.

At the end of the day, all work needs to progress one step at a time. So if a thread knows that it's done the current work, it simply adds update_rate to its local time, and works on that. This is less desirable, since individual work units typically cannot progress indefinitely without coordinating with other systems.
Y'all seem to have missed the part about "inbuilt profiler" ;)

Quote:using RDTSC is clearly a no go, it counts CPU cycles on the host core, variable rate CPU frequencies are problematic is a thread switching onto a different core.

The times, they are a changin'.
Your concern about variable frequency is mooted by the "invariant RDTSC" feature, which is guaranteed to increment at a constant rate despite P- and C-transitions (including STPGNT). There are also goodies such as the performance monitoring counters that give you unhalted clocks (arguably exactly what you want for an in-game profiler) or a counter in the uncore clock domain (Nehalem/Westmere-specific).

Synchronizing the separate TSCs can be done manually (requires a small kernel-mode driver to write MSRs), or you can pin threads to a processor and maintain separate timer states, or use the RDTSCP instruction to read a CPU identifier atomically.

The HPET is another reliable "high"-resolution timer (at least 14 MHz), but requires a recent Intel chipset and Windows 7 or a slightly more involved kernel-mode driver that is able to map physical memory.

The common denominator? These are all system-specific, but I'd argue that is acceptable for a profiler, because you have control over your development hardware.
E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3
Hi All,

Thanks for the replies.

One of the reasons I need this is because there are no good tools for profiling multiple threads on consoles. I simply want the PC version of my engine to work consistently with the console version.

I'm working to try and do away with the concept of 'the main thread', there are simply tasks, and threads available to service them. Losing an entire core to timing would be extremely painful / skew performance metrics more than having inaccurate timers.

Good to here that problems with QueryPerformanceCounter aren't common on multiple core machines. I might be able to write some code which detects there are issues and inform the users that his profiles are suspect that that PC.

Thanks,
Martin
Cheers,MartinIf I've helped you, a rating++ would be appreciated
Quote:
One of the reasons I need this is because there are no good tools for profiling multiple threads on consoles.

I find that hard to believe. PIX on the 360 is excellent. Sony has equally in-depth profilers we've used at work, though I've never personally used them.
Quote:Original post by KulSeran
Quote:
One of the reasons I need this is because there are no good tools for profiling multiple threads on consoles.

I find that hard to believe. PIX on the 360 is excellent. Sony has equally in-depth profilers we've used at work, though I've never personally used them.


PIX on 360 is excellent however it has a few failings and profiling threads is one of them.
Cheers,MartinIf I've helped you, a rating++ would be appreciated

This topic is closed to new replies.

Advertisement