How to properly get delta time on modern hardware?

Started by
62 comments, last by Finalspace 7 years, 1 month ago

As for my multithreaded approach I would like to have this

Logic -> 4 threads

Physics -> 0 (no physics in my apps)

Rendering -> 4 threads (software rendering 3d/2d worlds)

Edit: so for these things I have to have inter-core timing right?

You didn't come into this world. You came out of it, like a wave from the ocean. You are not a stranger here. -Alan Watts

Advertisement

What the hell, where'd everyone go?

You didn't come into this world. You came out of it, like a wave from the ocean. You are not a stranger here. -Alan Watts

So I just want to be clear that we came to some conclusion here. I have to always give up one core no matter what I do to use QPC.

No, you don't need to do anything like that. QPC does just work without any need for dedicated threads, etc...

You have a bug in your game loop somewhere.

Edit: I already mentioned this but the reason I'm not using timeGetTime is because it's returning zero for delta.

That would be expected if your game is running at 1000fps, or 60fps without you calling timeBeginPeriod(1).

I'm not sure why you guys are saying that it's low resolution for game code. It could go down to 1ms, my update code must be at least 16ms that would give me 60hz refresh.

If you're measuing deltas and accumulating them, then think of 1ms resolution meaning that you can lose around 0.5ms worth of real time per frame. So a 60Hz game loop is potentially losing 1.8 seconds of real time per minute, or potentially losing over a minute of real time per hour. I would not buy an alarm clock with that kind of time-keeping ability.

So I just want to be clear that we came to some conclusion here. I have to always give up one core no matter what I do to use QPC.

No, you don't need to do anything like that. QPC does just work without any need for dedicated threads, etc...

You have a bug in your game loop somewhere.

Edit: I already mentioned this but the reason I'm not using timeGetTime is because it's returning zero for delta.

That would be expected if your game is running at 1000fps, or 60fps without you calling timeBeginPeriod(1).

I'm not sure why you guys are saying that it's low resolution for game code. It could go down to 1ms, my update code must be at least 16ms that would give me 60hz refresh.

If you're measuing deltas and accumulating them, then think of 1ms resolution meaning that you can lose around 0.5ms worth of real time per frame. So a 60Hz game loop is potentially losing 1.8 seconds of real time per minute, or potentially losing over a minute of real time per hour. I would not buy an alarm clock with that kind of time-keeping ability.

But I can also gain 0.5ms, I'd lose 1.8 seconds if it just happens that on every call to timeGetTime I round down, but I think realistically it balances out to almost nill.

edit: so if that is correct then I can just use timeGetTime with a frame limit and should be ok right? Something like if(overflow > limit) then Update?

You didn't come into this world. You came out of it, like a wave from the ocean. You are not a stranger here. -Alan Watts

But I can also gain 0.5ms, I'd lose 1.8 seconds if it just happens that on every call to timeGetTime I round down, but I think realistically it balances out to almost nill.
That's worst case, you could also gain 1.8 seconds, or be lucky and have all the errors balance.

The 0ms delta shows the worst case scenaro. If your game is running at 1000fps, your delta will always be 0ms, so no matter how much real world time passes, no in-game time will ever pass. The effect is similar at other fps rates. At one fps the errors will be more likely to be +0.5ms, and at a different fps they errors will be more likely to be -0.5ms. The chance of the errors cancelling out perfectly is slim (and based on chance). If you care about keeping real world time (many games don't have to!) then millisecond accurate timing is not a valid option.

so if that is correct then I can just use timeGetTime with a frame limit and should be ok right? Something like if(overflow > limit) then Update?
Yep. This won't keep accurate time - it will likely be out by a few seconds after a while, but the bigger your 'limit' value, the more accurate it will be.

Again though, QPC does work. Everyone uses it. Most people use it without any special threaded magic. If it's not working for you then there's probably a bug in your game loop.

Those articles are from 2006. That's when multicore CPUs just started appearing. There's nothing weird about bugs in new technology. There hasn't been a problem for 10 years since then.
As I tried to explain, there is a problem in 2017, and the problem is that QueryPerformanceCounter is implemented incorrectly.

If I break in the debugger and single-step instructions on my machine, I get this:


0x401613    callq  *0x1bd53(%rip) <__imp_QueryPerformanceCounter>
...
0x771559a0    jmp    0x771559a8 <QueryPerformanceCounter+8>
...
0x771559a8    jmpq   *0x882a2(%rip)
...
0x77389fd0    sub    $0x28,%rsp
0x77389fd4    testb  $0x1,0x7ffe02ed
0x77389fdc    mov    %rcx,%r9
0x77389fdf    je     0x773edde0 <ntdll!EtwEventSetInformation+64608>
0x77389fe5    mov    0x7ffe03b8,%r8
0x77389fed    rdtsc
0x77389fef    movzbl 0x7ffe02ed,%ecx
0x77389ff7    shl    $0x20,%rdx
0x77389ffb    or     %rdx,%rax
0x77389ffe    shr    $0x2,%ecx
0x7738a001    add    %r8,%rax
0x7738a004    shr    %cl,%rax

The translation of that is:


jump around
jump around
jump around
rdtsc
do some shit
return

The code uses rdtsc and does not serialize (the correct pattern prior to availability of rdtscp was cpuid; rdtsc since rdtsc is not a synchronizing instruction, and cpuid is the only instruction available in usermode otherwise which does that job, it is however a tidbit expensive, an extra 30 or so cycles), therefore no measurements that you make are accurate anywhere near the presumed precision.

The pipeline will be full or half-full or empty, depending on what processor you run on, what its pipeline depth is, what instructions were executed prior to calling QueryPerformanceCounter, and depending on whether those three jumps incidentially caused enough delay to retire all in-flight operations. If you only care about millisecond or possibly microsecond resolution then that's of course alright, because in that case... who cares anyway. But if you talk in terms of ten-nanosecond resolution like QPC does, then this is just shit. It's none more and none less but an incorrect implementation.

Using the rdtscp instruction instead, despite inline code you don't save anything performance-wise compared to calling into ntdll (being half-serializing is still surprisingly expensive), but what matters is that your measurements are correct. That is, the point in time that your measurement refers to is well-defined, not random.

This is nonsense, and QPC is not implemented "incorrectly" just because it can be used incorrectly or inappropriately. The behavior you're requesting from QPC would be an extremely unwelcome and poorly considered addition to the implementation. The use of rtdsc to provide ground truth timing data is perfectly valid on single socket systems providing invariant TSC, which is nearly all of them now. On systems that don't, or systems that are multi-processor, it won't be correct and so that's not the code that QPC will actually run. Multi-core is not multi-processor. The "need" for synchronization only occurs in trying to time tight instruction sequences on an out of order execution unit, which is not a vaguely relevant use case. If we were talking about correct implementation of micro benchmarks, then maybe that would be useful information. It is absolutely not significant, useful, necessary, or appropriate to forcibly sync the instruction stream to get accurate game timing. QPC certainly shouldn't imply an instruction stream sync, as microbenchmarking instruction execution is not its purpose in the first place.

If you really need to support dead consistent accurate timing on dual core Athlons from 2006 that haven't been properly patched for a buggy BIOS, then use interlocked instructions to publish current timing values to your threads. Or just do the architecturally sane thing and give times/intervals as inputs to functions rather than have random threads/tasks reading timestamps at random points.

SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.

But I can also gain 0.5ms, I'd lose 1.8 seconds if it just happens that on every call to timeGetTime I round down, but I think realistically it balances out to almost nill.
That's worst case, you could also gain 1.8 seconds, or be lucky and have all the errors balance.

The 0ms delta shows the worst case scenaro. If your game is running at 1000fps, your delta will always be 0ms, so no matter how much real world time passes, no in-game time will ever pass. The effect is similar at other fps rates. At one fps the errors will be more likely to be +0.5ms, and at a different fps they errors will be more likely to be -0.5ms. The chance of the errors cancelling out perfectly is slim (and based on chance). If you care about keeping real world time (many games don't have to!) then millisecond accurate timing is not a valid option.

so if that is correct then I can just use timeGetTime with a frame limit and should be ok right? Something like if(overflow > limit) then Update?
Yep. This won't keep accurate time - it will likely be out by a few seconds after a while, but the bigger your 'limit' value, the more accurate it will be.

Again though, QPC does work. Everyone uses it. Most people use it without any special threaded magic. If it's not working for you then there's probably a bug in your game loop.

I've got my loop posted here, I'm not really sure where there could be a bug. I get the delta before the Update and set the last time after the update.

Edit: OK so this is just out of curiosity, this is the structure of my framework WindowsMain->GameEngineUpdate()->ApplicationUpdate()->QueryPerformanceCounter(). So if I do this then the QPC is nonzero and everything is fine. But I don't want to have each app track it's own time and would like to have the the engine handle it. But when I do this WindowsMain->GameEngineUpdate()->QueryPerformanceCounter() then ApplicationUpdate() I again get QPC of zero. It seems like the QPC calls have to be in the same compiled code object, probably because of the issues mentioned earlier. If you still insist it works and I'm just doing it wrong perhaps a little example of the proper way that you use in your engine could help?

You didn't come into this world. You came out of it, like a wave from the ocean. You are not a stranger here. -Alan Watts

Ok I missed this earlier:


while(...)
{
           QueryPerformanceCounter(&currentTime);
           Timer.deltaTime = float( (currentTime.QuadPart - Timer.lastTime.QuadPart) / Timer.frequency.QuadPart);
           DoLotsOfWork();
           QueryPerformanceCounter(&Timer.lastTime);
}

That's equivalent to:


while(...)
{
           QueryPerformanceCounter(&Timer.lastTime);
           QueryPerformanceCounter(&currentTime);
           Timer.deltaTime = float( (currentTime.QuadPart - Timer.lastTime.QuadPart) / Timer.frequency.QuadPart);
           DoLotsOfWork();
}

Notice how you're not timing how long DoLotsOfWork takes, you're just getting the time twice in a row.
You want something more like:


while(...)
{
           QueryPerformanceCounter(&currentTime);
           Timer.deltaTime = float( (currentTime.QuadPart - Timer.lastTime.QuadPart) / Timer.frequency.QuadPart);
           Timer.lastTime = currentTime;
           DoLotsOfWork();
}

Also, this code is actually doing integer division, not floating point division, hence why you get 0 as a result!
Instead of:
float( (currentTime.QuadPart - Timer.lastTime.QuadPart) / Timer.frequency.QuadPart);
Try:
float( (currentTime.QuadPart - Timer.lastTime.QuadPart) / (double)Timer.frequency.QuadPart);

As I tried to explain, there is a problem in 2017, and the problem is that QueryPerformanceCounter is implemented incorrectly.
If I break in the debugger and single-step instructions on my machine, I get this:
(snip)


And that is absolute nonsense.

If you read the documentation, you will see a set of conditions is given under which QPC may be implemented differently on different machines.

All that you're proving is that you've determined the way that QPC is implemented on your own machine - nothing else.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Ok I missed this earlier:


while(...)
{
           QueryPerformanceCounter(&currentTime);
           Timer.deltaTime = float( (currentTime.QuadPart - Timer.lastTime.QuadPart) / Timer.frequency.QuadPart);
           DoLotsOfWork();
           QueryPerformanceCounter(&Timer.lastTime);
}

That's equivalent to:


while(...)
{
           QueryPerformanceCounter(&Timer.lastTime);
           QueryPerformanceCounter(&currentTime);
           Timer.deltaTime = float( (currentTime.QuadPart - Timer.lastTime.QuadPart) / Timer.frequency.QuadPart);
           DoLotsOfWork();
}

Notice how you're not timing how long DoLotsOfWork takes, you're just getting the time twice in a row.
You want something more like:


while(...)
{
           QueryPerformanceCounter(&currentTime);
           Timer.deltaTime = float( (currentTime.QuadPart - Timer.lastTime.QuadPart) / Timer.frequency.QuadPart);
           Timer.lastTime = currentTime;
           DoLotsOfWork();
}

Also, this code is actually doing integer division, not floating point division, hence why you get 0 as a result!
Instead of:
float( (currentTime.QuadPart - Timer.lastTime.QuadPart) / Timer.frequency.QuadPart);
Try:
float( (currentTime.QuadPart - Timer.lastTime.QuadPart) / (double)Timer.frequency.QuadPart);

Wait a second I'm a bit confused, the code you have times the DoLotsOfWork function, not the elapsed time between the call of the function. I thought the point was to to see how much time has passed since last draw call, otherwise I'm just timing how long my raster operations take. And I am casting numerator to float first, I think the first sample code was incorrect. I reposted the right one later.

Edit: Also that's just one branch of the while loop there is another branch with message handling, so that's why I assumed that even though the call to SetLastTime was at the end of the while loop it's not a fact that it's going to hit the top of the loop right after, maybe windows will switch something internally or the window handles messages, that's what I was timing. But the code you have doesn't time any of that, why?

You didn't come into this world. You came out of it, like a wave from the ocean. You are not a stranger here. -Alan Watts

This topic is closed to new replies.

Advertisement