Jump to content
  • Advertisement
Sign in to follow this  
xor

High-resolution performance counters

This topic is 4831 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi. I've been having a little trouble trying to come up with the best way to handle time dependent(cpu independent, fps independent) processing. First i thought about GetTickCount, but alas it has a very low precision. At best i got 16 miliseconds precision and 1000 miliseconds(1second) divided by 16 gives me just over 60 somewhat precise updates per second. The code i'm working with has to be flexible and may have to be executed as fast as the frames are displayed, since frame rates can go up to several hundreds GetTickCount won't do. So now i'm messing around with high-resolution performance counters(HRPC), using QueryPerformanceFrequency and QueryPerformanceCounter. I can figure out the frequency at which the HRPC updates, but the precision now is just to damned high [smile]. So high the frequency is returned in a LARGE_INTEGER(two DWORDS). My cpu clock works at 3.2 Ghz so i get a frequency of around 3200000000, that can barely fit in the low DWORD, more, while with the frequency i could safely assume i wouldn't need to check the high DWORD(at least for a while), i can't assume it for the counter, so this is going to get ugly. Some pseudo-code:
init()
{
	/*So i calculate the frequency of the HRPC and
	i'm going to assume the frequency can fit in the lowdword for now*/
	frequency = get_frequency().lowdword;
}

mainloop()
{
	/*Here i subtract the lasttime from the currenttime to get the time slice,
	and remains in the two dwords*/
	timeslice.lowdword = currenttime.lowdword - lastime.lowdword;
	/*In case of an underflow*/
	if(currenttime.lowdword<lasttime-lowdword)
		++lastime.highdword;
	timeslice.highdword = currenttime.highdword - lastime.highdword;
	
	/*Here i calculate the percentage of time that as passed
	and move the timeslice from the two dwords to a double,
	because otherwise i'de just get zeros(int), or unprecise results(float),
	because i'll probably end up dividing 1 by 3200000000, which is very small*/
	double_precision_timeslice = timeslice.lowdword/frequency;
	double_precision_timeslice += (timeslice.highdword/frequency)*0xffffffff;
	
	/*Perform whatever based on the amount of time passed since last update*/
}


Let's see if it works. Lets consider a common case, an update of 1/60 of a second on my cpu(3.2Ghz), and the currenttime is 123:1234567890(DWORD:DWORD)(123:1234567890 + 1/60 = 123:1287901223):
frequency = 3 200 000 000;

timeslice.lowdword = 1287901223 - 1234567890 = 53333333; This is our 1/60 of a second
timeslice.highdword = 123 - 123; Nothing to see here, move along [smile]
	
double_precision_timeslice = 53333333/frequency = 0,0166666665625;
double_precision_timeslice += (0/frequency)*0xffff = 0,0166666665625 + 0 = 1/60;
Now something weirder. 2.5(3200000000*2.5) second update, on same cpu with currenttime=123:3456789012(123:3456789012 + 2.5 = 125:2856789012):
frequency = 3 200 000 000;

/*-600000000 unsigned is 3700000000*/
timeslice.lowdword = 2856789012 - 3456789012 = -600000000 = 3700000000;
timeslice.highdword = 125 - (123+1) = 1; +1 from the underflow
	
double_precision_timeslice = 3700000000/frequency = 1,15625;
double_precision_timeslice += (1/frequency)*0xffff = 1,15625 + 1,34217728 = 2.50...;
Ok, this seems to work. The problem is this is suposed to run in a critical part of the code, so i would be much happier if i avoided the double precision, the divisions and the excessive memory calls. So how would you guys do it? [Edited by - xor on July 28, 2005 1:11:02 AM]

Share this post


Link to post
Share on other sites
Advertisement
Guest Anonymous Poster
take a look in the docs and you'll notice that LARGE_INTEGER is a union of 2 doublewords (LowPart and HighPart) with 1 quadword (QuadPart), meaning you could do direct arithmetic using QuadPart and store the results in a __int64.

also, GetTickCount is dependent upon the resolution of the system timer, which varies among Windows versions, but you can control the system timer resolution via the timeBeginPeriod function, after calling timeGetDevCaps to get the timer's minimum resolution. you could also try timeGetTime or timeGetSystemTime. and not all systems have performance counter capabilities.

Share this post


Link to post
Share on other sites
I faced the exact same problem. Thus, I made those little functions wich use the RDTSC assembler instruction (wich is way faster than the performance counter of windows).

You need to initialize the counter when your program starts with "initMilliCounter()". After, you can call "getTickCountPrecise()" to get the tick count (wich is 100% accurate compared to getTickCount). Also, the code can be modified very easily to be more or less precise.

here it is :


#ifndef __GETTICKCOUNTPRECISE
#define __GETTICKCOUNTPRECISE 1

#include <windows.h>
#include <math.h>

unsigned int Hz;
void getTickCount(unsigned int *d, unsigned int *a) {

__asm {
RDTSC
mov ebx, d;
mov ecx, a;
mov [ebx], edx;
mov [ecx], eax;
}
}

unsigned int getTickLaps() {
unsigned int d1,a1, d2, a2, d3, a3;

//now, get tick counts between 200ms
getTickCount(&d1, &a1);

//work at 100% for 100 ms
unsigned int temp = GetTickCount();
while(GetTickCount() - temp < 100);

getTickCount(&d2, &a2);

d3=d2-d1;
a3=a2-a1;

if(d3 != 0)
a3 += (unsigned int)pow(2, 32);

return a3*10;
}

void initMilliCounter() {

unsigned int temp;
unsigned int before=10000000, after=0;
while(before/(1000*1000) != after/(1000*1000)) {
before = getTickLaps();

//work at 100% for 50 ms
temp = GetTickCount();
while(GetTickCount() - temp < 50);

after = getTickLaps();
}
Hz = after;
}

unsigned int getTickCountPrecise() {

unsigned int d1,a1;
__int64 total;
getTickCount(&d1, &a1);

total = a1 + d1*pow(2,32);
return (unsigned int)(total / (Hz/1000));
}
#endif




Share this post


Link to post
Share on other sites
I would probably go with an assembly solution (like what Crucifier posted or just something to quickly deal with the LARGE_INTEGER types that QPC and QPF use). There are some zesty instructions for dealing with that stuff.

Completely off topic: Oh yes, you gotta love the floating point instruction that loads PI. Don't need a #define or anything.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!