Jump to content

  • Log In with Google      Sign In   
  • Create Account


Like
0Likes
Dislike

Timing Pitfalls and Solutions

By Jan Wassenberg | Published Apr 15 2004 10:17 PM in Game Programming

timer time system safe resolution clock tsc thread frequency
If you find this article contains errors or problems rendering it unreadable (missing images or files, mangled code, improper text formatting, etc) please contact the editor so corrections can be made. Thank you for helping us improve this resource

I recently wrote a timer (in the sense of returning a timestamp on request), and am writing this report in the spirit of "learn now, pay forward", with the sincere hope that it helps someone avoid the same experience. Windows time APIs are discussed, but the concepts apply to all PCs. Acronyms are explained in the Glossary; let's jump right in.


Timer Requirements
The timer was intended for 0ad and whatever future needs may present themselves.

The design goals were:

  • reliability - it must work around several known timer issues (see below)
  • high resolution, useable for profiling
  • minimal jitter / jumping, monotonic
  • tracking the system time
The timer would only be necessary on Windows - Linux already has gettimeofday, which fits the bill.


PC time sources
  • Tick counter: This is basically a variable incremented every clock interrupt, which is triggered by the PIT or RTC and typically comes every 10 or 15 ms.
  • Chipset timers (PIT / PMT): These are hardware counters running at 1.19 / 3.57 MHz. They are quite slow to read (requires port I/O), on the order of 1 µs.
  • CPU timestamp counter (TSC): This is incremented every clock. The overhead is small - about 11 clocks (1).
There are some other esoteric ways to get the time, e.g. polling the DRAM refresh detect bit, but these aren't relevant here.


Windows time APIs
  • GetTickCount (GTC): as billed. 8 clock overhead (load, mul, shift).
  • GetSystemTimeAsFileTime (GSTAFT): also basically a tick counter, but is returned in higher resolution units (hectonanoseconds), and isn't updated evenly (time of day anomalies, as explained later). Each call takes 6 clocks (64-bit load/store + synchronization (4)).
  • timeGetTime (TGT): third tick counter, but its resolution can be improved to 1 ms via timeBeginPeriod (the clock interrupt frequency is increased). Note that this slows down the system disproportionately - interrupt latency goes up, and the CPU cache suffers. This function takes 102 clocks; I have not analysed what it does in detail.
  • QueryPerformanceCounter: uses either the PIT (Win2k) or PMT (WinXP) chipset timers, or the TSC (on SMP HAL systems). These take ~3 µs, 700 ns, and 100 clocks (guesstimate), respectively.
libc functions such as _ftime and clock are built on top of GSTAFT.


Timer problems
The GTC and GSTAFT resolution is far too low - 10 ms isn't enough to resolve individual frames in a game, not to mention it's useless for profiling.

TGT falls behind over time due to delayed or lost clock interrupts; I read that this sometimes broke StarTopia multiplayer games, and the problem wasn't noticed because it didn't happen on the development machines. Unfortunately, I do not remember where, nor can I find it.

QPC has several issues:

  • Hardware bug: race condition when updating parts of the register? I have seen speculation to this effect more than once. ([qpc_race])
  • It jumps forward several hundred ms sometimes under heavy PCI bus load. ([qpc_jump])
  • The PIT is slow to read - it requires sending a latch command, and 2 8-bit reads, for a total of ~3 µs.
  • When using the TSC, it is documented to return different values depending on which processor is running the code, on some (buggy) systems.
.. and how to solve them
2 approaches were considered:

  • Have all 3 timers 'vote' - idea from [pc-timers]. This is somewhat shaky though - which timer do we trust if they differ? What if one of them is always incorrect? (e.g. SpeedStep enabled, and QPC uses the TSC)
  • Choose a 'safe' timer for the particular machine, read it every second from a thread (bonus: accurate frequency measurement), and compare it against a known good time source, e.g. the thread sleep interval. This is what I went with, for robustness.
Choosing a safe timer is done by checking (in order of preference, based on resolution and overhead) if each timer's known problems occur on this system and whether we can work around them.

Examples: the TSC must not be used on SMP systems (inconsistent between processors), or when SpeedStep (throttling CPU frequency for cooling, or to save power) is active. The QPC race condition is worked around by reading its value every second from a thread; the jump problem is dodged by recognizing that on such systems (single processor desktop PIII-class) as it is known to occur, the TSC would be chosen, because it is safe to use there.


Locking timer to system time
So far, so good. We have a high resolution timer (HRT), and it deals with the above errors. What's left is to lock it to the system time (ST).

Again, there were 2 possibilities:

  • 'Bound' it with the system time, i.e. let the HRT free-run until it differs by 1.5 ticks, then bring it back. This has the advantage that we don't need to know the exact ST, i.e. where in the current 10 ms tick we are, but the timer would not be monotonic, and would jump often.
  • Given the exact ST, calculate the difference WRT to the HRT, and slew its frequency accordingly. This hinges on the ST query (see below); it's also difficult to build a stable control loop that converges on the system time, and doesn't overshoot too much / bounce back and forth. I chose this method because jumps in time (especially backwards) are unacceptable.
This design is basically a PLL, as described in [tcl_timer] and [prec_time]. If we have the ST only good to 10 ms, we can obviously never return the current time closer to its true value. The only time the ST value is (almost) what it should be is right after it's updated. Without OS support, the best we can do is rely on the assumption that GSTAFT and the scheduler are driven by the same clock, i.e. after our calibration thread wakes up, the ST has just been updated. (Note: we can't poll for the start of the next tick - 10 ms delays are unacceptable). If this assumption is false, we are toast, otherwise, it works quite well. A simpler way to schedule periodic wakeups is via timeSetEvent, which has its own (high priority) thread.

With the basic code in place, I set about finding an error correction function, h(hrt_to_st_difference) |-> hrt_freq_slew. The simple method given in [tcl_timer] is a p-controller (2) with clamped maximum adjustment. In my tests, its gain was way too high, and it was basically bouncing back and forth at the maximum allowed adjustment value. I hit the books on control theory, and understood enough to write a PID controller (3). This worked surprisingly well: the HRT was typically only a few microseconds (!) off from the given ST value.


.. only to get bitten
I left Winamp running during a long test, and the results were horrible. The calibration thread wasn't waking up on time (as shown by differing HRT and ST time deltas since last wakeup) - up to 5 ms late. With inaccurate system time values, the controller was constantly adjusting the frequency back and forth. I couldn't quite explain these values - my thread was _HIGH priority, so it must be drivers, and streaming a little music from the hard drive and sending it to the sound card doesn't account for that much latency. I added band-aids like using TGT boosted to 1 ms resolution to get more accurate ST, and even using the HRT to guess how late the wakeup came (contrary to the goal of syncing to the actual system time). This ran to 1200 lines of C++ code. It helped, but the timer still drifted too much.


Shakedown
Having spent insane amounts of time on the simple matter of a timer, I came across a discussion of these problems ([ntdev]). I now completely agree with the assessment that long-term accurate, high resolution timing is, lacking a hardware timer capable of this, very difficult and not worth the trouble. Time of day doesn't usually need to be high resolution, and timers for profiling don't need to be free of long term drift. Moreover, you don't want to saddle a simple timer with time of day corrections. The Windows ST is updated in a strange fashion: every 57th or so update comes later, every 32nd comes earlier (from memory). Sometimes, the ST had not been updated after thread wakeup - is it done from a DPC? Finally, the time can jump due to NTP correction (built into WinXP).


Aftermath
I stripped out the PLL, PID controller, and error correction code, leaving the HRT and the part of the calibration thread that measured its frequency, resulting in a (more) simple timer that works, and is only 430 lines. A few days later, I found out the excessive wakeup delays when running Winamp were due to a dying hard drive - it needed many retries to read the data. I now think the timer could have worked, but I believe splitting into high resolution timestamp and time of day functions is better. A decent timer (HPET) has finally been added to the architecture, but isn't widespread yet; failing that, the OS really should take care of this business. In the meantime, this timer will do.

I hope this helps someone; at least I learned a lot during the whole mess ;)

If you're interested in the (GPLed) source, drop me a line. Also, questions/comments/suggestions/bug reports welcome! u9rktiiistud.uni-karlsruhe.de (remove digit, replace "iii" with "@")


Notes
All CPU clock timings are for my system, an Athlon XP.

(1) For compatibility, a CPUID instruction is usually executed before RDTSC, to make sure that it is executed in-order. This adds another 42 clocks. (2) Proportional: adjust = gain * error. A typical example is a float regulator. Disadvantages: depending on gain, it either doesn't track the signal rapidly, or it is unstable, and diverges. Also, if slightly above/below the target value, it will never correct the difference, because its adjust goes to 0. (3) P, and Integral / Differential parts. I sums up previous error values, so that slight offsets can be corrected. D sees a 'trend' early, and adjusts for it, thereby tracking the signal more rapidly. (4) The system time is mapped into process memory. GSTAFT must make sure it returns consistent results, because the value may change at any time. This is done here by comparing against a second copy of the MSW, and reading again if it is different (avoids the need for a spin lock).
Glossary
PIT: Programmable Interval Timer (8254). 1.19 MHz counter; can generate interrupts; programmable divisor. PMT: Power Management Timer (part of ACPI). 3.57 MHz counter. RTC: Realtime Clock. Battery-buffered clock, with 32 KHz oscillator; can generate interrupts; programmable 2^n divisor. TSC: Timestamp Counter, a counter incremented every CPU cycle. HPET: High Precision Event Timer, a new x MHz hardware timer (x >= 10). GTC: GetTickCount* TGT: timeGetTime* QPC: QueryPerformanceCounter* GSTAFT: GetSystemTimeAsFileTime* SMP HAL: Windows Hardware Abstraction Layer for Symmetric Multiprocessing systems. HRT: High Resolution Timer. ST: Windows System time. PLL: Phase Lock Loop. PID: Proportional, Integral, Differential controller - see (2) and (3). * Windows time APIs


References
[pc-timers] <a href="http://www.mindcontrol.org/%7Ehplus/pc-timers.html">http://www.mindcontrol.org/~hplus/pc-timers.html [mm-timer] http://www.microsoft.com/whdc/hwdev/platform/proc/mm-timer.mspx [qpc_race] http://www.etestinglabs.com/bi/cont2000/200012/perfcnt.asp [qpc_jump] http://support.microsoft.com/default.aspx?scid=KB;EN-US;Q274323& [tcl_timer] http://www.tcl.tk/cgi-bin/tct/tip/7.html [prec_time] http://www.eecis.udel.edu/~mills/database/rfc/rfc1589.txt [ntdev] http://www.ntdev.org/archive/ntdev9908/msg0217.html
Code
// decide upon a HRT implementation, checking if we can work around // each timer's issues on this platform, but allow user override // in case there are unforeseen problems with one of them. // order of preference (due to resolution and speed): TSC, QPC, TGT. // split out of reset_impl so we can just return when impl is chosen. static void choose_impl() { bool safe; #define SAFETY_OVERRIDE(impl)\ if(overrides[impl] == HRT_DISABLE)\ safe = false;\ if(overrides[impl] == HRT_FORCE)\ safe = true; #if defined(_M_IX86) && !defined(NO_TSC) // CPU Timestamp Counter (incremented every clock) // ns resolution, moderate precision (poor clock crystal?) // // issues: // - multiprocessor systems: may be inconsistent across CPUs. // could fix by keeping per-CPU timer state, but we'd need // GetCurrentProcessorNumber (only available on Win Server 2003). // spinning off a thread with set CPU affinity is too slow // (we may have to wait until the next timeslice). // we could discard really bad values, but that's still inaccurate. // => unsafe. // - deep sleep modes: TSC may not be advanced. // not a problem though, because if the TSC is disabled, the CPU // isn't doing any other work, either. // - SpeedStep/'gearshift' CPUs: frequency may change. // this happens on notebooks now, but eventually desktop systems // will do this as well (if not to save power, for heat reasons). // frequency changes are too often and drastic to correct, // and we don't want to mess with the system power settings. // => unsafe. if(cpu_caps & TSC && cpu_freq > 0.0) { safe = (cpus == 1 && !cpu_speedstep); SAFETY_OVERRIDE(HRT_TSC); if(safe) { hrt_impl = HRT_TSC; hrt_nominal_freq = (i64)cpu_freq; return; } } #endif // TSC #if defined(_WIN32) && !defined(NO_QPC) // Windows QueryPerformanceCounter API // implementations: // - PIT on Win2k - 838 ns resolution, slow to read (~3 µs) // - PMT on WinXP - 279 ns ", moderate overhead (700 ns?) // issues: // 1) Q274323: may jump several seconds under heavy PCI bus load. // not a problem, because the older systems on which this occurs // have safe TSCs, so that is used instead. // 2) "System clock problem can inflate benchmark scores": // incorrect value if not polled every 4.5 seconds? solved // by calibration thread, which reads timer every second anyway. // - TSC on MP HAL - see TSC above. // cache freq because QPF is fairly slow. static i64 qpc_freq = -1; // first call - check if QPC is supported if(qpc_freq == -1) { LARGE_INTEGER i; BOOL qpc_ok = QueryPerformanceFrequency(&i); qpc_freq = qpc_ok? i.QuadPart : 0; } // QPC is available if(qpc_freq > 0) { // PIT and PMT are safe. if(qpc_freq == 1193182 || qpc_freq == 3579545) safe = true; // make sure QPC doesn't use the TSC // (if it were safe, we would have chosen it above) else { // can't decide yet - assume unsafe if(cpu_freq == 0.0) safe = false; else { // compare QPC freq to CPU clock freq - can't rule out HPET, // because its frequency isn't known (it's at least 10 MHz). double freq_dist = fabs(cpu_freq / qpc_freq - 1.0); safe = freq_dist > 0.05; // safe if freqs not within 5% (i.e. it doesn't use TSC) } } SAFETY_OVERRIDE(HRT_QPC); if(safe) { hrt_impl = HRT_QPC; hrt_nominal_freq = qpc_freq; return; } } #endif // QPC // // TGT // hrt_impl = HRT_TGT; hrt_nominal_freq = 1000; return; assert(0 && "hrt_choose_impl: no safe timer found!"); hrt_impl = HRT_NONE; hrt_nominal_freq = -1; return; } The rest of the code is straightforward and mostly uninteresting.






Comments

Note: Please offer only positive, constructive comments - we are looking to promote a positive atmosphere where collaboration is valued above all else.




PARTNERS