Sleep less than 1ms?

Started by
61 comments, last by Bozebo 13 years, 9 months ago
I have a frame timing system (c++, Windows) which limits the rate of a loop to a capped fps value (or uncapped). The problem is it will use 100% cpu all the time (in capped mode) as it has a loop which checks QueryPerformanceCounter to decide when to continue to the next frame. I tried putting Sleep(1) within that loop so it doesn't waste too much time but doing so limits the possible frame rates by a large amount and causes inaccuracies because 1ms could be a significant proportion of each actual frame which is processed.

Here is the culprit method in my class.
void frameController::capFps(){  if(targetFps == 0) return; //do nothing if fps is uncapped  while(true){    //find the current tick of the high resolution timer    QueryPerformanceCounter(&hpt);    //if done waiting    if(hpt.QuadPart >= frameStartTicks + maxTicksPerFrame)      break; //exit the while loop    //Sleep(1); //yield to cpu  }}


The whole system works incredibly well overall, but I just don't like the way the thread will process the loop in that function either constantly or at 1ms intervals. Is there a way to sleep for less than 1ms? Or is there a particular solution I should be using to get the job done? Effectively I want it to yield to the cpu more. Or is there no problem where I am perceiving this problem? Sure enough it is using 100% of the core, but it isn't causing an actual problem apart from being ugly.

Here is the whole implementation of my fps limiter (should compile nicely):
#include <cstdlib>#include <iostream>#include <Windows.h>using namespace std;//very basic abs functionint abs(int in){  if(in < 0){    return -in;  }  return in;}class frameController{  private:    LARGE_INTEGER hpt; //64 bit result holder for the high performance timer    LONGLONG ticksPerSecond, //resolution of the high performance timer    frameStartTicks, //ticks at the start of the previous frame    frameNowTicks, //ticks at the time step is called    initTicks, //ticks when the frame controller was initialised    ticksForFrame, //ticks that the frame took to process    totalFrameTicks, //ticks the frame took including capped idle time    maxTicksPerFrame; //maximum number of ticks a frame should take        //calculates the animation factor    void calcAnimFactor();    //calculates how many ticks the frame took to compute    void calcTicksForFrame();    //calculates the fps    void calcFps();    //caps the frame rate    void capFps();      public:    unsigned long frameCount; //frame counter    int targetFps, //desired frame rate    fps; //current frame rate    float animFactor, //animation scale    timeScale, //time scale    floatFps; //float version of the frame rate        //constructor    frameController(){      QueryPerformanceFrequency(&hpt); //gather timer resolution      ticksPerSecond = hpt.QuadPart; //store resolution (will not change)      cout << "ticksPerSecond: " << ticksPerSecond << endl;      QueryPerformanceCounter(&hpt); //gather tick count      initTicks = hpt.QuadPart; //store tick count at init      cout << "initTicks: " << initTicks << endl;      timeScale = 1; //default timescale is 1    }        //called at the start of every frame to process the frame control    void step();        //setters    //target frame rate    void setTargetFps(unsigned int sTargetFps);    //frame start ticks    void setFrameStartTicks(LONGLONG sFrameStartTicks);};void frameController::setTargetFps(unsigned int sTargetFps = 0){  targetFps = sTargetFps; //set the target fps  cout << "targetFps: " << targetFps << endl;    //calculate ticks per frame at the desired frame rate  if(targetFps != 0)    maxTicksPerFrame = ticksPerSecond / targetFps;  cout << "maxTicksPerFrame: " << maxTicksPerFrame << endl;}void frameController::setFrameStartTicks(LONGLONG sFrameStartTicks){  frameStartTicks = sFrameStartTicks;}void frameController::step(){  //find the current tick of the high resolution timer  QueryPerformanceCounter(&hpt);  //extract current tick count  frameNowTicks = hpt.QuadPart;  //calculate how many ticks the frame took  calcTicksForFrame();  //increment frame counter  frameCount ++;  //cap the fps  capFps();  //calculate the fps  calcFps();  //calculate animation factor  //calcAnimFactor();  //find the current tick of the high resolution timer  QueryPerformanceCounter(&hpt);  //remember the tick count  frameStartTicks = hpt.QuadPart;}void frameController::calcTicksForFrame(){  ticksForFrame = frameNowTicks - frameStartTicks;  //clamp results incase the hpt returned an erroneous value  if(ticksForFrame < 1){    //clamp to 1    ticksForFrame = 1;    //if the frame took too long  } else if(ticksForFrame > maxTicksPerFrame * 3){    //clamp it to maxTicksPerFrame    ticksForFrame = maxTicksPerFrame;  }}void frameController::calcFps(){  //find the current tick of the high resolution timer  QueryPerformanceCounter(&hpt);  //record the entire number of ticks which passed for the whole frame  totalFrameTicks = hpt.QuadPart - frameStartTicks;  //calculate the float fps  floatFps = float(ticksPerSecond/totalFrameTicks);  //calculate the integer fps  fps = int(floatFps);}void frameController::calcAnimFactor(){  //calculate the animation factor  animFactor = totalFrameTicks/double(ticksPerSecond/floatFps);  cout << "animFactor: " << animFactor << endl;  //don't let the animation factor be too low  if(animFactor <= 0)    //smallest possible positive value for 32 bit float    animFactor = 1.175494351e-38F;  //apply time scale to animation factor  //animFactor *= timeScale;}void frameController::capFps(){  if(targetFps == 0) return; //do nothing if fps is uncapped  while(true){    //find the current tick of the high resolution timer    QueryPerformanceCounter(&hpt);    //if done waiting    if(hpt.QuadPart >= frameStartTicks + maxTicksPerFrame)      break; //exit the while loop    Sleep(0); //yield to cpu  }}//make a frame control objectframeController fpsHandler;//application entry pointint main(){  //remain on a single core for timing (QueryPerformanceCounter)  SetThreadAffinityMask(GetCurrentThread(),1);    //high performance timer (hpt2 to avoid confusion with frameController member)  LARGE_INTEGER hpt2;    QueryPerformanceFrequency(&hpt2); //gather timer resolution  LONGLONG hptFreq = hpt2.QuadPart, //hpt resolution  startSec, //tick to start counting a second from  nowTime; //current tick at a point in the logic    int frameOffset, //offset frame gap between seconds (shows inaccuracies)  lastFrameOffset, //offset in the previous frame  fpsDifference, //gap between actual fps and intended fps  syncInRow = 0, //how many offsets have been the same in a row (big = accurate)  frameProc, //processing as an integer count, to do each frame  baseFrameProc = 200000, //base amount of processing each frame  lastFrameProc, //processing done in the previous frame  secondsRun = 0; //how many seconds have been passed  float procDeviation = 0.1; //multiple of baseFrameProc to add as random load    fpsHandler.setTargetFps(100); //choose the target frame rate  cout << "targetFps: " << fpsHandler.targetFps << endl;    cout << "Frame loop will begin in 1 second\n";  Sleep(1000); //wait a bit before proceeding      //keep running  bool run = true;  //print frame information spam  bool printFrameInfo = false;  //do any extra frame load processing  bool doFrameProc = true;    if(doFrameProc){    //make a starting value so the first frame isn't wrongly considered abnormal    lastFrameProc = baseFrameProc + rand()%int(baseFrameProc*procDeviation);  }    //find the current tick of the high resolution timer (just before main loop)  QueryPerformanceCounter(&hpt2);     startSec = hpt2.QuadPart; //remember when to start counting the second  fpsHandler.setFrameStartTicks(startSec);    while(run){ //main loop      fpsHandler.step(); //step the frame rate controller first      //other engine singletons etc            if(printFrameInfo){        //cout << "frameProc: " << frameProc << endl;        cout << "frame: " << fpsHandler.frameCount << endl;        cout << "fps: " << fpsHandler.fps << endl;      }            //find the current tick of the high resolution timer      QueryPerformanceCounter(&hpt2);      nowTime = hpt2.QuadPart; //the current tick      //if a second has passed      if(nowTime >= startSec + hptFreq){        secondsRun ++;        system("CLS");        cout << "run for " << secondsRun << "s\n";                startSec = nowTime; //remember when to start counting the second                //if fps is capped at a value        if(fpsHandler.targetFps != 0){          //find the offset of the frame from the intended fps          frameOffset = fpsHandler.frameCount % fpsHandler.targetFps;                    //if this frame offset is the same as the last          if(frameOffset == lastFrameOffset){            //since last second, frames have been in sync            syncInRow ++; //note that another frame has been in sync            cout << "frames are in sync\n";            cout << "syncInRow: " << syncInRow << endl;          } else {            syncInRow = 0; //reset frames in sync in a row            cout << "frames are out of sync!\n";            cout << "frameOffset: " << frameOffset << endl;            cout << "lastFrameOffset: " << lastFrameOffset << endl;          }          lastFrameOffset = frameOffset; //remember the frame offset                    //find the difference between target and actual fps          fpsDifference = abs(fpsHandler.fps - fpsHandler.targetFps);          //if the fps is the target fps          if(fpsDifference == 0){            cout << "fps is on target: " << fpsHandler.targetFps << endl;          } else {            cout << "fps is " << fpsDifference << " away from target\n";          }        } else {          //show the fps          cout << "fps: " << fpsHandler.fps << endl;        }      }            //if frame process loading is on      if(doFrameProc){        //10% deviation in frame load        frameProc = baseFrameProc + rand()%int(baseFrameProc*procDeviation);        //if the deviation was more than deviation expectation        if(abs(frameProc - lastFrameProc) > baseFrameProc*procDeviation){                    cout << "frameProc: " << frameProc << endl;          cout << "lastFrameProc: " << lastFrameProc << endl;        }        lastFrameProc = frameProc;        for(int i = 0;i < frameProc; i ++){} //busy up the cpu to test fps      }      //handle other threads: AI, physics input, networking, rendering etc            //any particular frame-end specific logic (endStep events possibly)  }    //correctly exit the application  system("PAUSE");  return EXIT_SUCCESS;}

edit:
The code used to work flawlessly and keep 100fps tests completely in sync, but since browsing the web a bit and looking up some things it has magically broken a little and 10 or so frames in a row will be reported by its self test as being a little bit out of sync. Also the fps should be changable on the fly but I havn't tested that yet. I have yet to tie in the timescale or animation modifications as yet - though they may end up handled by an independent system.
edit 2:
after adding line 214: system("CLS");
the sync seems to be fixed. The encountered problems are likely issues outwith my control - and are probably to be expected (afterall, that is why I made it self test, I could not expect a perfect frame timing system, could I?)

--------

Also while you are here, what do you think of my class overall? I am quite proud of it but I don't want to go and use it as a basis for complicated work if it is going to be a problem down the line.
Is it worth making an alternative implementation for systems which have no QueryPerformanceTimer? Or are those days long gone (most of the articles I've read were written around 2000). Also what are the chances of QueryPerformanceCounter/Timer returning a crazy value on a modern system? And would any capping overheads (I have implemented a couple) be an issue in the overall accuracy of frame timing?

[Edited by - Bozebo on July 3, 2010 11:28:26 AM]
Advertisement
May I ask why you want to cap your fps?
If V-sync is enabled, fps will be capped. If not, the reason is to leave the fps uncapped.
So I say cap fps with V-sync.
If you want to cap because of the game logic, I suggest you to switch to fixed timestep.
Quote:Original post by szecs
May I ask why you want to cap your fps?
If V-sync is enabled, fps will be capped. If not, the reason is to leave the fps uncapped.
So I say cap fps with V-sync.
If you want to cap because of the game logic, I suggest you to switch to fixed timestep.


I do not want vsync, this is intended for PC games where vsync causes inappropriate lag. vsync will be an option for users, and for such situations the frame timer will be set to uncapped mode and OpenGL will automatically result in the system sitting at 60fps. Esentially the system does act as a fixed timestep when the fps is capped, if the computer it is running on is capable of 300fps and it is capped to 100 - it is exactly the same as having a fixed timestep of 100ms. Except with the issues I am trying to resolve here.

I am pretty much trying to replicate the system used by the source engine, and I want the fps to be adjustable in-engine for various different purposes.
There isn't a way to sleep for 1ms, or less than 1ms, or any particularly precise time interval at all. Sleep tells the system that you want to be left alone for at least that long, but the actual time could be anything.
SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.
Quote:Original post by Bozebo
Quote:Original post by szecs
May I ask why you want to cap your fps?
If V-sync is enabled, fps will be capped. If not, the reason is to leave the fps uncapped.
So I say cap fps with V-sync.
If you want to cap because of the game logic, I suggest you to switch to fixed timestep.


I do not want vsync, this is intended for PC games where vsync causes inappropriate lag. vsync will be an option for users, and for such situations the frame timer will be set to uncapped mode and OpenGL will automatically result in the system sitting at 60fps. Esentially the system does act as a fixed timestep when the fps is capped, if the computer it is running on is capable of 300fps and it is capped to 100 - it is exactly the same as having a fixed timestep of 100ms. Except with the issues I am trying to resolve here.

I am pretty much trying to replicate the system used by the source engine, and I want the fps to be adjustable in-engine for various different purposes.


The only way to do that is by using a busy loop.

sleeping should only be used to reduce cpu usage, not as a method to control the speed your software runs at.
[size="1"]I don't suffer from insanity, I'm enjoying every minute of it.
The voices in my head may not be real, but they have some good ideas!
Is there a way to sleep for less than 1ms?

Not that I'm aware of.

Chapter 6: Processes, Threads, and Jobs, Inside Microsoft® Windows® 2000, Third Edition

Quote:
...
When a thread is selected to run, it runs for an amount of time called a quantum. A quantum is the length of time a thread is allowed to run before Windows 2000 interrupts the thread to find out whether another thread at the same priority level or higher is waiting to run or whether the thread's priority needs to be reduced.
...
Each thread has a quantum value that represents how long the thread can run until its quantum expires. This value isn't a time length but rather an integer value, which we'll call quantum units.

By default, threads start with a quantum value of 6 on Windows 2000 Professional and 36 on Windows 2000 Server. (We'll explain how you can change these values later.) The rationale for the longer default value on Windows 2000 Server is to minimize context switching. By having a longer quantum, server applications that wake up as the result of a client request have a better chance of completing the request and going back into a wait state before their quantum ends.

Each time the clock interrupts, the clock-interrupt routine deducts a fixed value (3) from the thread quantum. If there is no remaining thread quantum, the quantum end processing is triggered and another thread might be selected to run. On Windows 2000 Professional, because 3 is deducted each time the clock interrupt fires, by default a thread runs for 2 clock intervals; on Windows 2000 Server, by default a thread runs for 12 clock intervals.

Even if the system were at DPC/dispatch level or above (for example, if a DPC or an interrupt service routine was executing) when the clock interrupt occurred, the current thread would still have its quantum decremented, even if it hadn't been running for a full clock interval. If this was not done and device interrupts or DPCs occurred right before the clock interval timer interrupts, threads might not ever get their quantum reduced.

The length of the clock interval varies according to the hardware platform. The frequency of the clock interrupts is up to the HAL, not the kernel. For example, the clock interval for most x86 uniprocessors is 10 milliseconds, and for most x86 multiprocessors, 15 milliseconds.
...


Although that describes W2K, something similar probably holds for newer versions of Windows.

For additional details, see google:thread+quantum.

And for more on various ways to manage time, see Fix Your Timestep!


"I thought what I'd do was, I'd pretend I was one of those deaf-mutes." - the Laughing Man
Quote:Original post by Promit
There isn't a way to sleep for 1ms, or less than 1ms, or any particularly precise time interval at all. Sleep tells the system that you want to be left alone for at least that long, but the actual time could be anything.


Well leaving the thread along for at least x time is fine, if the time is less than 1ms - for this implementation. So, do most games use a fixed timestep?

vSync should never be forced on pc gamers. Unreal seems to be capable of variable fps or vsync capped fps (from my experience playing with game's settings). Source allows capped, variable or vsync limited fps. So there must be a way somehow to cap the fps at a value without vsync and without processing logic at an attempted constant rate (source does not use 100% of any core or 1 core's worth of processing power mixed across cores when capping fps).

One solution could be a dirty trick with winsock, but do I really want to go there in frame timing situations?


In the end, is it fine to let the thread keep processing and let other threads which need time simply take away from it's time slice? The application will likely be running threads for other aspects such as input, networking and rendering - if these threads need the time, they can take it from the core thread.

Another way would be to calculate the fixed-timestep based on the call to capFps... I didn't think about that before, and it could be a good technique.
Quote:Original post by Bozebo
Quote:Original post by Promit
There isn't a way to sleep for 1ms, or less than 1ms, or any particularly precise time interval at all. Sleep tells the system that you want to be left alone for at least that long, but the actual time could be anything.


Well leaving the thread along for at least x time is fine, if the time is less than 1ms - for this implementation. So, do most games use a fixed timestep?


Even if you could specify a time less than 1ms. You are not guaranteed that your thread will resume in that time. It is only the minimum.

Most big commercial games have either vsync on or off which either caps or uncaps the framerate.

The easiest way would be to sit in a while loop and poll a performance timer until the correct amount of time has passed, which is what you have done. I dont think any other methord that yields the CPU will be reliable.
Quote:Original post by szecs
May I ask why you want to cap your fps?
If V-sync is enabled, fps will be capped. If not, the reason is to leave the fps uncapped.
So I say cap fps with V-sync.
If you want to cap because of the game logic, I suggest you to switch to fixed timestep.


QFE.

It is largely impossible to predict how long anything will take on a variety of PCs. Decouple rendering from game logic, interpolate to smooth out the results and you're far more flexible in terms of target hardware.
Quote:One solution could be a dirty trick with winsock, but do I really want to go there in frame timing situations?


If you mean (ab)using the select statement (i.e. no socket sets at all, only the timeout interval) that is not really a dirty trick. I have used that on various systems to implement 'timers' less than 1 second. Although I am not sure about the precission since that was never that important.
Ron AF Greve

This topic is closed to new replies.

Advertisement