How to properly get delta time on modern hardware?

Started by
62 comments, last by Finalspace 7 years ago

Alternatively it's possible to use std::chrono::high_resolution_clock but on Windows system it probably uses QPC internally anyways.

This.

Why does it seem like no one has even heard about std::chrono? It feels like I've been repeating myself about it in a couple of threads about time. Yeah, std::chrono::high_resolution_clock uses QPC internally to access time on newer Windows compilers at least. But the plus side is that you won't have to bother with calls to the Windows API and it's very easy to use.

Advertisement

Use <chrono>, ffs. And yes, it does call QPC internally on Windows.

Simple example:


#include <chrono>

class Timer {
public:
  Timer() { prev = now(); }

  //get seconds elapsed since last getDT (or construction)
  float getDT() {
    auto cur = now();
    std::chrono::duration<float> seconds = cur - prev;
    prev = cur;
    return seconds.count();
  }

private:
  static std::chrono::time_point<std::chrono::high_resolution_clock> now() {
    return std::chrono::high_resolution_clock::now();
  }

  std::chrono::time_point<std::chrono::high_resolution_clock> prev;

};

If you look over the docs, there are a lot of units for represent various durations and points in time, etc. You can cast between units to do conversions in a moderately human-readable way.

void hurrrrrrrr() {__asm sub [ebp+4],5;}

There are ten kinds of people in this world: those who understand binary and those who don't.

Quoting articles that are 10+ yrs old really helps no one. Honestly if they didn't patch their hardware (and yes in most cases patches of some sorted existed) it is their problem. Do not code for the fringe cases unless that is your market.

If you're using C++11 or higher do exactly as Khatharr shows and move on with your life. This is a pretty solved problem, that this thread is so long scares me.

"Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety." --Benjamin Franklin

Quoting articles that are 10+ yrs old really helps no one. Honestly if they didn't patch their hardware (and yes in most cases patches of some sorted existed) it is their problem. Do not code for the fringe cases unless that is your market.

If you're using C++11 or higher do exactly as Khatharr shows and move on with your life. This is a pretty solved problem, that this thread is so long scares me.

The article aren't the problem, neither using the Win API directly or using the std::chrono, the problem is that it is impossible to track what's wrong without any sample of code.

* flies away like a penguin *

Those articles are from 2006. That's when multicore CPUs just started appearing. There's nothing weird about bugs in new technology. There hasn't been a problem for 10 years since then.
As I tried to explain, there is a problem in 2017, and the problem is that QueryPerformanceCounter is implemented incorrectly.

If I break in the debugger and single-step instructions on my machine, I get this:


0x401613    callq  *0x1bd53(%rip) <__imp_QueryPerformanceCounter>
...
0x771559a0    jmp    0x771559a8 <QueryPerformanceCounter+8>
...
0x771559a8    jmpq   *0x882a2(%rip)
...
0x77389fd0    sub    $0x28,%rsp
0x77389fd4    testb  $0x1,0x7ffe02ed
0x77389fdc    mov    %rcx,%r9
0x77389fdf    je     0x773edde0 <ntdll!EtwEventSetInformation+64608>
0x77389fe5    mov    0x7ffe03b8,%r8
0x77389fed    rdtsc
0x77389fef    movzbl 0x7ffe02ed,%ecx
0x77389ff7    shl    $0x20,%rdx
0x77389ffb    or     %rdx,%rax
0x77389ffe    shr    $0x2,%ecx
0x7738a001    add    %r8,%rax
0x7738a004    shr    %cl,%rax

The translation of that is:


jump around
jump around
jump around
rdtsc
do some shit
return

The code uses rdtsc and does not serialize (the correct pattern prior to availability of rdtscp was cpuid; rdtsc since rdtsc is not a synchronizing instruction, and cpuid is the only instruction available in usermode otherwise which does that job, it is however a tidbit expensive, an extra 30 or so cycles), therefore no measurements that you make are accurate anywhere near the presumed precision.

The pipeline will be full or half-full or empty, depending on what processor you run on, what its pipeline depth is, what instructions were executed prior to calling QueryPerformanceCounter, and depending on whether those three jumps incidentially caused enough delay to retire all in-flight operations. If you only care about millisecond or possibly microsecond resolution then that's of course alright, because in that case... who cares anyway. But if you talk in terms of ten-nanosecond resolution like QPC does, then this is just shit. It's none more and none less but an incorrect implementation.

Using the rdtscp instruction instead, despite inline code you don't save anything performance-wise compared to calling into ntdll (being half-serializing is still surprisingly expensive), but what matters is that your measurements are correct. That is, the point in time that your measurement refers to is well-defined, not random.

Quoting articles that are 10+ yrs old really helps no one. Honestly if they didn't patch their hardware (and yes in most cases patches of some sorted existed) it is their problem. Do not code for the fringe cases unless that is your market.

If you're using C++11 or higher do exactly as Khatharr shows and move on with your life. This is a pretty solved problem, that this thread is so long scares me.

The article aren't the problem, neither using the Win API directly or using the std::chrono, the problem is that it is impossible to track what's wrong without any sample of code.

* flies away like a penguin *

Actually they are a problem as some people might think they still matter. Yes, the OP did nothing to help himself by providing no code and then blaming everything but himself. But at the same time, a few people stated that QPC was bad or not to use it. That is not the case though. That was what I referring too.

"Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety." --Benjamin Franklin

I've never seen a thread be so divided, it seems like half are pro QPC and half are against. In any case this is my WinMain loop:


resetEntry:
 
    CreateEngineWindow();
    GameEngineInitialization();
 
    while (TRUE)
    {
        // test if there is a message in queue, if so get it and remove it from the queue
        if (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE))
        {
 
            if (msg.message == WM_QUIT)
                break;
 
 
            TranslateMessage(&msg);
            DispatchMessage(&msg);
 
        }
      else
      {
           LARGE_INTEGER currentTime;
           QueryPerformanceCounter(&currentTime);
           Timer.deltaTime = float( (currentTime.QuadPart - Timer.lastTime.QuadPart) / Timer.frequency.QuadPart);
 
           //for first run
           if (Timer.deltaTime > 3)
                Timer.deltaTime = 0;
 
           int code = GameEngineUpdate();
 
           if (code == APPUPDATE_RESET)
           {
              ApplicationSettings();
              GameEngineShutdown();

              goto resetEntry;
           }
 
           QueryPerformanceCounter(&Timer.lastTime);
     }
 
}

You didn't come into this world. You came out of it, like a wave from the ocean. You are not a stranger here. -Alan Watts

This is it:

LARGE_INTEGER currentTime;
QueryPerformanceCounter(&currentTime);
Timer.deltaTime = float(currentTime.QuadPart - Timer.lastTime.QuadPart) / Timer.frequency.QuadPart;
That's at frame start at frame end I call QPC(&Timer.lastTime).
Those QuadParts are very large 64 bit numbers - converting them to floats can absolutely destroy accuracy. The way I've usually seen this done is more like:
double prevTimeInSeconds = timeInSeconds;
timeInSeconds = ((double)now.QuadPart)/((double)freq.QuadPart);
float delta = (float)(timeInSeconds - prevTimeInSeconds);
Frequency and absolute time in seconds must be 64bit values. Delta time can be a 32bit value.

Every game I've worked on has used QPC without caring about multicore bugs and we've never had a problem...

Samoth's objection seems to be valid when using QPC for profiling small code sections, but won't really affect frame timing?

What's the difference between when I convert to float before the division or after? The frequency is in the range of 2 million, the actual double double difference between times are about 6k usually so in most cases it's doing something like float(6k) / double double(freq), the result should be something like 0.00x way less than float capacity.

You didn't come into this world. You came out of it, like a wave from the ocean. You are not a stranger here. -Alan Watts

So I just want to be clear that we came to some conclusion here. I have to always give up one core no matter what I do to use QPC. So on my quad core machine I'm only left with 6 physical threads? How is this possible?

Edit: But how would that even work? If I'm timing a non active thread and my main code is on other core then the qpc will still return an invalid result.

Edit: OK after some more reading, I think Microsoft is saying QPC should not be used for inter-core timing. The examples they give is just timing within the thread, I'm guessing mostly for codebenching purposes. The proper way seems to be using timeGetTime. I'm not sure why you guys are saying that it's low resolution for game code. It could go down to 1ms, my update code must be at least 16ms that would give me 60hz refresh. For 30hz refresh I could need as much as 32ms.

Edit: I already mentioned this but the reason I'm not using timeGetTime is because it's returning zero for delta.

You didn't come into this world. You came out of it, like a wave from the ocean. You are not a stranger here. -Alan Watts

This topic is closed to new replies.

Advertisement