Main loop timing

Started by
14 comments, last by Aliii 10 years, 5 months ago

Lets say that on the beginning of the main loop I realize that in the next few milliseconds (lets say 7ms) I dont need any render() or physics().

What do I do?

-I cant call sleep( 7), and not even sleep( 5), because it seems to be totally unreliable.

-If I let the main loop run without entering the render() or physics() functions for the next 7ms, then I basically get a while( true){} loop, which results in very high CPU usage.

-If I call render() in every iteration, then the CPU usage goes down, but the FPS goes up ....obviously.(and the GPU usage goes up)

A second question. ...since I switched from WinXP to Win7, ....the system limits the FPS to 60. Thats very kind, but how does it do that? The only system functions I call are for swapping the buffers and checking the system messages.

Advertisement
I recommend you read Gaffer's Fix Your Timestep article. In the article, Gaffer recommends you render as many times as possible, but only update your simulation at a fixed rate. Interpolation can be used in between updates to give a smooth feel.


A second question. ...since I switched from WinXP to Win7, ....the system limits the FPS to 60. Thats very kind, but how does it do that?

Vsync, usually. It could be that your video drivers have enabled vsync by default. Try checking your video driver's control panel.

On a related note, you can sleep(1) to yield your process's timeslice to the operating system. This can help reduce CPU and GPU usage. Be careful, though; as it may take upwards of 8 to 20 ms (47%-120% of a 60 fps frame) before your program regains control...

I recommend you read Gaffer's Fix Your Timestep article. In the article, Gaffer recommends you render as many times as possible, but only update your simulation at a fixed rate. Interpolation can be used in between updates to give a smooth feel.

I would recommend against rendering as fast as you can, actually. I usually cap my loops to render at most as many frames as the monitor supports by refresh rate. This didn't used to be such a big deal, but in the modern world of laptops and smart devices it's crucial to not waste battery power rendering frames the user cannot physically ever see. Likewise interpolation is usually unneeded and I would advise against it. At best you'll be adding a lot of complexity to your renderer to give your game that slightly strange effect that 120hz interpolating tvs give to broadcast shows. At worst it does nothing, as most LCDs are still 60hz and running at the same rate as your update loop anyway.

Also, you should put the CPU to sleep whenever possible. In practice it rarely takes 8 ms to get the CPU back, and you can adjust the scheduler timing on windows with timeSetPeriod and timeEndPeriod. It's general practice in games to call timeSetPeriod(1) at the start of your program to minimize latency from sleep(). This gives you the best of both worlds: you get back out of sleep quickly, but you also don't waste CPU cycles and power spinning in a wait loop.

http://msdn.microsoft.com/en-us/library/windows/desktop/dd757624%28v=vs.85%29.aspx

Otherwise his timestep technique still works very well in general.

I would recommend against rendering as fast as you can, actually. I usually cap my loops to render at most as many frames as the monitor supports by refresh rate.

I counter-recommend for it.
Provide players with the option of v-sinc and do not at all ever try to manually match the refresh rate of the device. You will never match it as well as v-sinc does and you will get either jitter or a single constant tear at a somewhat consistent area of the screen caused by being a roughly constant time value off actual v-sync flipping.

This didn't used to be such a big deal, but in the modern world of laptops and smart devices it's crucial to not waste battery power rendering frames the user cannot physically ever see.

Laptops: Allow the user v-sync, and even enable v-sync automatically if you detect a mobile CPU.
Smartphones: All of them force v-sync on, which means doing it manually as a “feature” of your engine is both useless and will only cause jitter.

Likewise interpolation is usually unneeded and I would advise against it. At best you'll be adding a lot of complexity to your renderer to give your game that slightly strange effect that 120hz interpolating tvs give to broadcast shows. At worst it does nothing, as most LCDs are still 60hz and running at the same rate as your update loop anyway.

I counter-advise for it.
It is important when your logical update rate is in the typical range of around 30 FPS. To say, “it does nothing because most LCD’s are running at the same rate as your update loop,” is in the huge vast majority of the cases false, and will necessarily be wrong between 2 people with monitors running at different rates.
One of the main reasons for decoupling your framerate and logical updates (the #1 being simulation stability) is to use fewer resources on the CPU side, executing AI and physics less often.
As such, the vast majority of cases will have an “update” (logical update) rate around 30 FPS, often even lower. In fact the general rule of thumb is to keep it as low as possible so that your simulation is normally playable, so it is not rare to have it being in the 20’s of FPS’s. Likewise, racing games may crank it up into the multiple hundreds.

An “update loop” is a very vague term, but since he is talking about interpolation we know there is some connection to logical updates, and since there are many terms to define a full game loop, such as “game loop” or even just “loop”, I have to assume he is talking about logical updates here, and the idea that a logical update will happen at the same rate as the monitor’s refresh is false in 99% of all cases (in decoupled loops).

Also, you should put the CPU to sleep whenever possible. In practice it rarely takes 8 ms to get the CPU back, and you can adjust the scheduler timing on windows with timeSetPeriod and timeEndPeriod. It's general practice in games to call timeSetPeriod(1) at the start of your program to minimize latency from sleep(). This gives you the best of both worlds: you get back out of sleep quickly, but you also don't waste CPU cycles and power spinning in a wait loop.

http://msdn.microsoft.com/en-us/library/windows/desktop/dd757624(v=vs.85).aspx

Otherwise his timestep technique still works very well in general.

Once again I have to recommend this advice be regretfully ignored.
Firstly, timeSetPeriod() is a Windows®-wide global function, and can interfere with other processes on your machine, as well as degrade overall performance.
Secondly, it consumes more PC resources and battery life, which goes exactly against what was mentioned before.
Thirdly, it has no effect on QueryPerformanceCounter() and friend, which is what you should be using for timing in your game. This is not related to Sleep(), but…
Fourthly, Sleep() is not even the proper way to give resources back to the machine, and if you increase the system resolution you are giving fewer resources back anyway, defeating the purpose. The proper way to give resources back is to actually wait for timed system events, such as v-sync. On iOS you have CADisplayLink, on all consoles you have a v-sync notification, and on desktops you have a simple setting that can be enabled to allow automatic v-sync waiting.

In other words you have nothing to gain and everything to lose with timeSetPeriod().



Everything fastcall22 said was already correct and should sufficiently answer your question, except for his note on sleep(1), which is a semantic issue but 2 things should be noted:

#1: Semantically speaking, on Windows®, Sleep( 0 ) yields its time slice and on other platforms yield() does.

#2: But “yielding” your thread does not necessarily mean another thread will run. In both cases it only lets another thread run if there is one waiting of equal or greater priority.

Semantics aside, Sleep(1) is still preferred over Sleep( 0 ), but I’d go with other ways of halting threads personally anyway, largely in part because of the thread mentioned by fastcall22.

I have also written on the subject:

Fixed-Time-Step Implementation

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

All valid counter arguments for very good reasons. I don't pretend that my suggestions are one size fits all, but I do still strongly disagree with "render as fast as you can" in most cases, especially over the limit of your updates. I didn't really consider the possibility of having your physics running at a slower rate, which is not uncommon, but input, actors updates, etc, "usually" run faster. If they are fixed at all, Unreal3 for example is variable timestep for updates and works pretty well (not sure how UE4 does it).

However, could you expand on my comments to this?

kuroioranda, on 23 Oct 2013 - 2:18 PM, said:snapback.png

Also, you should put the CPU to sleep whenever possible. In practice it rarely takes 8 ms to get the CPU back, and you can adjust the scheduler timing on windows with timeSetPeriod and timeEndPeriod. It's general practice in games to call timeSetPeriod(1) at the start of your program to minimize latency from sleep(). This gives you the best of both worlds: you get back out of sleep quickly, but you also don't waste CPU cycles and power spinning in a wait loop.

http://msdn.microsoft.com/en-us/library/windows/desktop/dd757624(v=vs.85).aspx

Otherwise his timestep technique still works very well in general.

Once again I have to recommend this advice be regretfully ignored.
Firstly, timeSetPeriod() is a Windows®-wide global function, and can interfere with other processes on your machine, as well as degrade overall performance.
Secondly, it consumes more PC resources and battery life, which goes exactly against what was mentioned before.
Thirdly, it has no effect on QueryPerformanceCounter() and friend, which is what you should be using for timing in your game. This is not related to Sleep(), but…
Fourthly, Sleep() is not even the proper way to give resources back to the machine, and if you increase the system resolution you are giving fewer resources back anyway, defeating the purpose. The proper way to give resources back is to actually wait for timed system events, such as v-sync. On iOS you have CADisplayLink, on all consoles you have a v-sync notification, and on desktops you have a simple setting that can be enabled to allow automatic v-sync waiting.

I gave that as an example of a way around it on windows specifically. No, it doesn't affect QueryPerformanceCounter, but the global timer resolution and high performance timing are orthogonal concerns. If I had mentioned timeGetTime(), that would have been one thing, but he was specifically asking for ways to not spin in a loop and blow CPU cycles waiting for a set time to elapse. Lowering the scheduler resolution to 1 does consume slightly more resources than having it higher, but it does NOT increase resource usage over never letting the CPU go idle. If he wants to let it spin at 100% CPU time, then sure, put it in a loop and keep checking QueryPerformanceCounter, but in his original post he said that he wants to put the CPU to sleep when he doesn't have anything for the game to do. You can lower the impact by using a slighly higher value (timeBeginPeriod(3), for example) that is still low enough to keep hiccups from appearing, but as Windows before Win8 is a ticked kernel you have to make in trade of slightly more CPU time spent handling CPU wakeups if you want the much bigger power win of idling the CPU down when it's not used.

And yes, timeBeginPeriod is global, but it's designed that way. It's how Windows works. Every program tells the OS what resolution it needs in the kernel and then the OS uses the smallest requested resolution (for Windows before Win8). If your program isn't prepared to have the timer resolution lowered on it, you're in serious trouble, because even Microsoft uses it. So no, I disagree that you have nothing to gain and everything to lose with timeBeginPeriod(). It exists for very good reasons, and used for those reasons it is not incorrect.

Specifically, do you know of another way to sleep your threads on windows that allows you the time granularity needed for games (serious question, I'd love to know because I'd rather not have to waste any extra CPU time with a lower timer resolution myself).

I read somewhere you should use WaitForSingleObject (or similar) and not sleep, as that signals to the scheduler that indeed you do want to do some processing in a short moment although you wait for something, because with sleep it would assume its some very non-timecritical program and could put the cpu into a very low power state screwing your timing up?

I read somewhere you should use WaitForSingleObject (or similar) and not sleep, as that signals to the scheduler that indeed you do want to do some processing in a short moment although you wait for something, because with sleep it would assume its some very non-timecritical program and could put the cpu into a very low power state screwing your timing up?

I hadn't heard of that used for game timing (although I've seen it in callstacks before), but sadly WaitForSingleObject appears to also be dependent on the timer resolution specified by timeBeginPeriod().

I do still strongly disagree with "render as fast as you can" in most cases, especially over the limit of your updates.

I don’t know what you mean by the limit of your updates. It may help if we all use a consistent set of terms: Render Update, Logical Update, and Game Loop. I assume you mean a logical update.
#1: I disagree with “strongly disagreeing to it”. I agree with “giving the player options”, and what you suggest is “taking away options”. In fact, the game engineers typically try to max out the CPU’s performance, using all available resources for whatever they can. And if you are playing a game, the performance of the rest of the applications on you machine don’t really matter unless the game is minimized, in which case yes, I do wait 10 milliseconds between game loops to give back the CPU power. As far as real-time caps go there are 2 things to consider:
-> #a: Too much power can overheat the system and fry parts. So the motivation for a cap is not related to refresh rates or starving other applications etc., it is about not frying the system.
-> #b: Therefor any cap at all should be based on getting the maximum performance out of the CPU without physically killing it. Which is extremely rare these days, and there are often system-wide settings the user can enable to prevent this. Do not force a cap on the user unless it is in the multiple hundreds of FPS’s such that no human eye can detect the difference. There are plenty of things people can do themselves, if and only if necessary, without you forcing it on them.


Specifically, do you know of another way to sleep your threads on windows that allows you the time granularity needed for games (serious question, I'd love to know because I'd rather not have to waste any extra CPU time with a lower timer resolution myself).

Your question is deceptively broad, so there are many things to say in reply.

The first thing that needs to be made very clear is that there are 2 types of waiting:
#1: Waiting for a given amount of time. This is called “sleeping”.
#2: Waiting for an event to happen. This is called…“waiting”.

These are 2 distinct states for a thread—waiting actually does put the thread in the most efficient state for CPU usage, sleeping still uses cycles based on the granularity of the timing system. Additionally, waiting has the (almost) exact “granularity” of for whatever event it is waiting, while sleeping has only the granularity of the system timer.
In short, you always want to wait when possible, not sleep.


With that made clear, and then to restate your question as, “Is there a better way to sleep for a more accurate time?”, the answer is No. Which makes it easy to draw the wrong conclusion—it would be easy to misunderstand and decide, “Then I guess that’s that—increase the timer resolution and sleep.” Do draw the correct conclusion, we need to keep deducing.

Sleeping does not offer a reliable granularity of down-time. Thus it should never be used in a system that needs fine granularity.
The game loop is one of the more core features of the engine/game, and anything you do there has a cascade effect down to all of the other parts of the engine. The only systems that should ever sleep (and yes, it does have its place) are systems in which granularity is not so important. Background threads loading resources. The sound thread (as a wake-up call when it has not been awakened by the game thread for too long and sound buffers need to be updated). Etc. These things aren’t thrown off by the timer granularity, and as a general rule of thumb: “If you need to call timeSetPeriod(), you are doing it wrong.”

The game loop should require much finer granularity and reliability, this waiting is the correct solution.
As wintertime mentioned, one related function is WaitForSingleObject().
However, for what object would you wait? You don’t have access to the object that triggers an event on every v-sync, but that would obviously be your choice of object for waiting.
Luckily, it just so happens that with v-sync enabled it will automatically wait on that object for you.
I already said this in my previous post, but iOS has CADisplayLink, consoles have v-sync events you can register, and Windows has simply an internal even/trigger that you can’t access directly, but will be don for you if you simply enable v-sinc.



The reason I did not just tell the original poster to Sleep( 1 ) or such is because I am sure the original poster knows that he wants to give resources back to the CPU (though even that is debatable), but he likely does not understand “sleeping” and “waiting”, and has mistakenly made the assumption he needs to sleep to accomplish this.
As far as the main game loop is concerned, the correct way to give resources back is to wait for v-sync.


sadly WaitForSingleObject appears to also be dependent on the timer resolution specified by timeBeginPeriod().

No, only the time-out period is.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid


I don’t know what you mean by the limit of your updates. It may help if we all use a consistent set of terms: Render Update, Logical Update, and Game Loop. I assume you mean a logical update.

Yes, logical update. Rendering more frames than your logical update rate gives the user no additional information. Sure, you can make it smoother (to the limits of max(refresh rate, human eye perception), unless vsync is off and then you are racing tears), but if your logical update is 30 and you are rendering at 60, the user can't actually influence the world and the actors won't update themselves any faster than 30.


#1: I disagree with “strongly disagreeing to it”. I agree with “giving the player options”, and what you suggest is “taking away options”. In fact, the game engineers typically try to max out the CPU’s performance, using all available resources for whatever they can. And if you are playing a game, the performance of the rest of the applications on you machine don’t really matter unless the game is minimized, in which case yes, I do wait 10 milliseconds between game loops to give back the CPU power. As far as real-time caps go there are 2 things to consider:
-> #a: Too much power can overheat the system and fry parts. So the motivation for a cap is not related to refresh rates or starving other applications etc., it is about not frying the system.
-> #b: Therefor any cap at all should be based on getting the maximum performance out of the CPU without physically killing it. Which is extremely rare these days, and there are often system-wide settings the user can enable to prevent this. Do not force a cap on the user unless it is in the multiple hundreds of FPS’s such that no human eye can detect the difference. There are plenty of things people can do themselves, if and only if necessary, without you forcing it on them.

.All valid points, the problem is that I think you are forgetting that the OPs intent is to lower this CPU usage. He's already told us he has plenty of CPU time left over between updates, and he doesn't want to peg the CPU. For bleeding edge, AAA, barely running on a platform games, everything that you suggested applies in force. I don't think that's what we're dealing with here though. For smaller, less CPU intensive games of the usual hobbyest or indie flavor (which I assume is what we are looking at here), getting things looking good while not destroying the laptop batteries of your casual audience is very important to user experience. I also love choices, but a lot of users don't understand the implications of those choices. In this case, I see very little benefit to allowing hundreds of fps if the updates are fixed and the display has a cap of what it can show the user anyways. It doesn't matter how fast your eye is if the transmitting medium is only feeding them so fast. Tweaking your settings to get the best possible performance out of your latest big game is great and an important tool, but when performance of your game already fits nicely in modern machines with cycles to spare they are not nearly as important.

(I am myself a gameplay and systems engineer professionally, btw, not a hobbyist).


Your question is deceptively broad, so there are many things to say in reply.


My apologies, I have asked it poorly then. I did intend to ask "Is there a better way to sleep for a more accurate time?", as you hit on.


With that made clear, and then to restate your question as, “Is there a better way to sleep for a more accurate time?”, the answer is No. Which makes it easy to draw the wrong conclusion—it would be easy to misunderstand and decide, “Then I guess that’s that—increase the timer resolution and sleep.” Do draw the correct conclusion, we need to keep deducing.


And here I must disagree again, because your conclusion is correct in the wrong circumstances. For contemporary AAA games, yes, you probably have the CPU saturated and sleeping is a moot point, waiting is much better. For smaller games, however, as I said above, this is not the case, and other factors become equally if not more important to consider than raw FPS.

However, I would argue this counterpoint even in the AAA case. Today's big budget CPU hogging games are tomorrows throw on your portable and go titles. I would always code a game to sleep when you have oodles of spare time and just wait otherwise. This way you get the best of both worlds. Today, you get performance, but tomorrow, you get to take Quake2 with you on the plane and catch a few games without destroying your battery at a rapid rate.


The game loop should require much finer granularity and reliability, this waiting is the correct solution.


Current consoles do around 30FPS in most games overall, but frame variability is insane. Sometimes your frames come in under 16ms because nothing interesting is happening on screen, sometimes you need to calculate a path or init an expensive AI behavior and you'll spike 150ms+ for a frame. This is also true on PC, although the numbers tend to be tighter because the hardware is so much better. This is neither granular nor terribly reliable, but it is the reality. Really the only thing that matters is that any invariability is imperceptible to the user, and a tick granularity of 1ms is well, well below that. If you get a few ms ahead here or a few ms behind here, unless you are running hundreds of FPS in a super twitchy game with a monitor that can actually display those frames, none of this is noticable by the user.

Yes, logical update. Rendering more frames than your logical update rate gives the user no additional information. Sure, you can make it smoother (to the limits of max(refresh rate, human eye perception), unless vsync is off and then you are racing tears), but if your logical update is 30 and you are rendering at 60, the user can't actually influence the world and the actors won't update themselves any faster than 30.

It makes the game appear to run smoothly. Again you are crossing wires.
#1: The purpose of decoupling logic and rendering is to provide a stable simulation. The logical updates (I repeat) should be as infrequently as necessary to make a simulation stable. Thus we introduce a problem: If the simulation only updates at 30 FPS, nothing will change until the next logical update so the same frame will be drawn over and over, reducing the visible FPS and creating a jerky environment for the player.
#2: Fix it via interpolation. Graphics interpolation. Nothing more.

It isn’t about how fast the user can press keys—and smooth input handling is a discussion just as large as this one—etc. The rate of logical updates is the rate of the game, and is set to the minimum that provides a steady simulation while being responsive enough to the player yet still not hogging CPU resources.

Arguing that rendering beyond the logical update rate is useless because it provides no extra information to the player is simply unrelated. It is there to solve a different problem entirely.


.All valid points, the problem is that I think you are forgetting that the OPs intent is to lower this CPU usage.

No, I’m not, hence:

The reason I did not just tell the original poster to Sleep( 1 ) or such is because I am sure the original poster knows that he wants to give resources back to the CPU (though even that is debatable), but he likely does not understand “sleeping” and “waiting”, and has mistakenly made the assumption he needs to sleep to accomplish this.



(I am myself a gameplay and systems engineer professionally, btw, not a hobbyist).

That’s fine but you seem to be using a lot of 3rd-party engines to do a lot of the low-level heavy lifting for you.
My job is to make those low-level engines do the heavy lifting for others.
I am an R&D programmer at tri-Ace, where I work on this engine:


Keeping this in mind, I can hardly believe you even said the following:

However, I would argue this counterpoint even in the AAA case. Today's big budget CPU hogging games are tomorrows throw on your portable and go titles. I would always code a game to sleep when you have oodles of spare time and just wait otherwise. This way you get the best of both worlds. Today, you get performance, but tomorrow, you get to take Quake2 with you on the plane and catch a few games without destroying your battery at a rapid rate.

…and a tick granularity of 1ms is well, well below that. …


As a core engine developer at tri-Ace my primary function is optimizations, and my current task is to optimize this:


In the video you will see a gigantic monster with guns on his back.
http://www.4gamer.net/games/232/G023268/SS/007.jpg
With its raw amount of skinning, this thing makes my job a pain in the ass. I spend many hours daily trying to make the whole game run faster and especially the scenes with this thing in it.

I had the same task on ???????W, where the bottleneck was all the rays they were shooting out around the characters to perform AI.

After all my efforts, if I were to then discover that some idiot wasted a whole millisecond of a frame sleeping for no reason, I kid you not I would punch him or her in the face. Twice.

What you just wrote makes my head asplode.

No sir. We do not do that to our AAA games, and I have never heard of that mentality ever before.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

This topic is closed to new replies.

Advertisement