How to properly get delta time on modern hardware?

Started by
62 comments, last by Finalspace 7 years, 1 month ago
Microsoft publishes an agonizing amount of detail on QPC: https://msdn.microsoft.com/en-us/library/windows/desktop/dn553408(v=vs.85).aspx
Short version is that it's highly consistent on modern day systems, even multiproc systems - but it can be an expensive call depending. If you're not getting consistent values on hardware made in the last ten years, there's a code bug on your end.

Also, in order to use timeGetTime effectively, you need to call timeBeginPeriod (1) to amp up the update frequency.
SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.
Advertisement

Why are you locking to a single core for QueryPerformanceCounter? The whole reason you use QPC instead of RDTSC is so you don't have to lock it to a specific core

That is what Microsoft tells you, but it's a lie. If you break in the debugger just before QPC is called and step into it, you will see that it is nothing but a simple call to RDTSC.

QPC does arguably work more reliably on very old systems because it reads (at least Microsoft says so) a more reliable timer on those select old platforms that don't support RDTSC properly.

The in my opinion "correct" approach for any kind of modern hardware is, however, to use RDTSCP, for which there exists the
__rdtscp() intrinsic. Note the extra "P".

This instruction does two things. First, it also returns the CPU core ID on which it executes, which is... uh... nice to know, but not very thrilling. Second, and more importantly, it does a half-side pipeline flush. Which means that all commands that appeared before the instruction have finished before the value is returned. It however allows instructions that follow to start before the result is available. This is much better for performance than serializing with CPUID, and it is correct as opposed to not serializing at all, like QPC does.

On processors that support the RDTSCP instruction (pretty much every processor less than 8-10 years old) you are guaranteed to have invariant TSC as well, so you are really measuring time, not cycles. The only two problems are that a) you need to know how much time one tick is worth, and b) while you are guaranteed that the value doesn't change while the CPU is up and running, you have no guarantee that it corresponds to the CPU's maximum clock frequency (it certainly doesn't, I've checked that!) or that it is always the same on different days (it certainly isn't either, I've checked that as well!).

An approach that works very reliably for me is to initialize a static global from an initializer function which records both system time and cpu ticks, and then sleeps for 50ms just to be sure that there is a guaranteed minimum delay of sorts. This is necessary because the system clock has limited precision, so you need a minimum delay between two measurements (more is better).

Then, some time later in the program, at the first occasion a TSC timer is created, its constructor reads the system time and current ticks, subtracts each of the two from the respective value at startup, and there you go, now you know with reasonable accuracy how much time one tick is worth, you're one multiply away from a proper micro- or nanosecond time.

The later the better of course -- the more time passes between static initialization and first using a TSC timer, the less the system timer's limited resolution weights in. For "micro", the "guaranteed 50ms delay" is already just about good enough, but if you need "nano" (which although you count in nanoseconds is closer to 5-8ns resolution in reality) you may want to let another half second or so pass before you first start a measurement.

This is as good as you can get with timing, the hardware simply does not support anything better. Even HPET doesn't help, although you might be inclined to believe that. A lot of people thought that when HPET was the new kid on the block.

But reality has it that HPET is big hype and little practical use. It lets the operating system -- with very high precision -- schedule an interrupt to occur at a set time in the future. Which is great, but... for making minute measurements with high precision, it's just useless.

I mean unlocked QPC doesn't work. I removed the affinity mask and everything went haywire. My animation is playing faster now when I interact with the window by moving or clicking mouse, and when I stop sending the window messages the animation starts to stagger. I think that's a good indication that QPC is malfunctioning because of being asked from separate cores. So what should I do now?

Correlation does not mean causation. Did you confirm the program switched threads, and that switch aligned with your issues? Are you using RDTSC or anything derived from it?

Your issue is somewhere else, but I suspect you'd rather blame QPC than accept that fact. If QPC had the issues you seem to think it has, sfml would be breaking every single application that uses their clock. SDL would break anything using SDL_GetPerformanceCounter.

Believe me or don't. It's really your call at this point, and I think you've gotten all the help you can receive until you provide more details that reveal where the actual bug is.

That is what Microsoft tells you, but it's a lie. If you break in the debugger just before QPC is called and step into it, you will see that it is nothing but a simple call to RDTSC.

On systems with invariant TSC, yes. It's an optimized call in those situations. A quick google search shows CPUID.80000007H:EDX[8] is what indicates invariant TSC. But on older systems, QPC will absolutely be a far more expensive call. You can even google about complaints on its performance.

I mean unlocked QPC doesn't work. I removed the affinity mask and everything went haywire. My animation is playing faster now when I interact with the window by moving or clicking mouse, and when I stop sending the window messages the animation starts to stagger. I think that's a good indication that QPC is malfunctioning because of being asked from separate cores. So what should I do now?


Correlation does not mean causation. Did you confirm the program switched threads, and that switch aligned with your issues? Are you using RDTSC or anything derived from it?

Your issue is somewhere else, but I suspect you'd rather blame QPC than accept that fact. If QPC had the issues you seem to think it has, sfml would be breaking every single application that uses their clock. SDL would break anything using SDL_GetPerformanceCounter.

Believe me or don't. It's really your call at this point, and I think you've gotten all the help you can receive until you provide more details that reveal where the actual bug is.

I will have to agree with @richardurich. Your QPC seems just fine. It's more likely the problem is in another part of your engine.

My QPC code is here :

while main loop:

peekmessage OR <QPC> update <QPC>

What could be wrong in this case?

You didn't come into this world. You came out of it, like a wave from the ocean. You are not a stranger here. -Alan Watts

What could be wrong in this case?

Probably the fact that that doesn't compile.

If you want to discuss actual problems with actual code and what might be done during the resulting machine code, actual code is kind of important (as well as details about which compiler you're using), so please post some.

My experience is that timeGetTime (albeit with a timeBeginPeriod(1) call) is more than sufficiently accurate for my game on any hardware I've run into. Now, my game is an RPG without really strict timing accuracy requirements, but things would still be a bit jerky if timeGetTime wasn't sufficiently accurate. I haven't seen that at all, even on moderately older hardware.

I suspect that much of the QueryPerformanceCounter advice comes from a previous era of flakier hardware and slower processors. My advice to anyone writing a game now would be to try timeGetTime to start, and only move on from there you run into an actual issue.

I'm gonna try real hard not to be overly snarky here, but apologies in advance if I fail.


We do computer science. Act like an engineer. Measure and test your hypotheses, don't just blindly copy paste opinion pieces from the internet and hope for the best.

When your hypotheses are invalidated, seek empirical data to explain why. Don't settle for superstition.


QPC is reliable. timeGetTime and timeBeginPeriod are shaky recommendations at best considering anyone else on the machine can make your timers vomit. And always post your code.




Rant over. Carry on.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

If you aren't actively seeing an issue, just use QPC with certainty that it will work how it is supposed to work. If you happen to run into a bug down the road (doubtful), it's easy enough to patch the code by replacing QPC calls with calls to read a timer you publish from a thread that doesn't change cores. But honestly, that will never happen. This theoretical bug you're wasting time trying to avoid just doesn't exist today. Sure it existed in the first multicore systems. But unless you have a time machine, don't worry about it.

Please stop repeating QPC works fine on multicore. It does not.

QPC has been historically a source of constant pain.

QPC has known issues on older AMD machines and Microsoft even released a hotfix to patch it, even though it's not perfect. AMD release its own patch as well named "Dual Core Optimizer".

If you're an AAA game, you probably don't care about these AMD CPUs since they won't be powerful enough to run your game. But if you're everybody else, these machines are pretty much still out there in the wild, as this bug is from chips manufactured around 8 years ago.

As samoth described, the most accurate way to get time in multicore is to use RDTSCP when available, and where it's not supported fallback to a thread locked to a single physical core that continuously polls QPC. If you don't want to poll, you can run a test first to see if QPC behaves correctly (doesn't go backwards in time; time deltas don't have a high variance). If it does, then you can use it fine. Otherwise you need to lock it to a core.

If you aren't actively seeing an issue, just use QPC with certainty that it will work how it is supposed to work. If you happen to run into a bug down the road (doubtful), it's easy enough to patch the code by replacing QPC calls with calls to read a timer you publish from a thread that doesn't change cores. But honestly, that will never happen. This theoretical bug you're wasting time trying to avoid just doesn't exist today. Sure it existed in the first multicore systems. But unless you have a time machine, don't worry about it.

Please stop repeating QPC works fine on multicore. It does not.

QPC has been historically a source of constant pain.

QPC has known issues on older AMD machines and Microsoft even released a hotfix to patch it, even though it's not perfect. AMD release its own patch as well named "Dual Core Optimizer".

Those articles are from 2006. That's when multicore CPUs just started appearing. There's nothing weird about bugs in new technology. There hasn't been a problem for 10 years since then.

Alternatively it's possible to use std::chrono::high_resolution_clock but on Windows system it probably uses QPC internally anyways.

This topic is closed to new replies.

Advertisement