Jump to content
  • Advertisement

Archived

This topic is now archived and is closed to further replies.

IFooBar

'rdtsc' op-code. and QueryPerfromaceCounter()

This topic is 5681 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello. I'm trying to figure out whats up with all these timing functions that are provided. There are 3 main ones. timeGetTime(), GetTickCount(), and QueryPerformanceInterface(). Now I read over here that QueryPerformanceInterface() had the best execution speed, and gave more precise timing info. However I used this piece of asm from intel to test stuff out
    
union ticksInt
{
	__int32 i32[2];
	__int64 i64;
};
__int64 getTicks()
{
	ticksInt a;
	__asm
	{
		rdtsc
		mov a.i32[0], eax
		mov a.i32[4], edx
	}
	return a.i64;
}
    
And this is the program I used to test all the timing functions...
  
      
void main()
{

	__int64 start, end, dif;
	int testLoops = 10000;

	start = getTicks();
	cout<<"getTicks "<< (dif = getTicks() - start)<< endl;

	start = getTicks();
	for( int i = 0; i < testLoops; i++ )
	{
		GetTickCount();
	}
	end = getTicks();

	cout<<"GetTickCount "<< (end - start - dif*2)/testLoops<< endl;


	start = getTicks();
	for( int i = 0; i < testLoops; i++ )
	{
		timeGetTime();
	}
	end = getTicks();

	cout<<"timeGetTime "<< (end - start - dif*2)/testLoops<< endl;

	LARGE_INTEGER li;
	start = getTicks();
	for( int i = 0; i < testLoops; i++ )
	{
		QueryPerformanceCounter(&li);
	}
	end = getTicks();
	cout<<"QueryPerformanceCounter "<< (end - start - dif*2)/testLoops<< endl;

}
    
It turns out that QueryPerformanceInterface is the slowest of the bunch, am I doing something wrong? Also about that piece of asm code... 1: Will it work on all x86 cpus? I guess it will only work on pentium or higher since it needs a 64 bit value? how about amd, and cyrix...? 2: Does the asm take just 2 clock cycles to execute? Im not sure how to test how long the getTicks() function takes to execute. 3: The value returned by getTicks() is different then the value returned by QueryPerformanceInterface. Shouldnt they be the same since they both get their data from the high performace timer? 4: How do I find out how many ticks there are in a second when using the getTicks() function? 5: Whats the compatibility status of thet getTicks() function? ie: is it safe to assuem that this function would work on all/most systems? if not then how do I go about this? I know those are a lot of questions, but I've just started learning asm, and I always used to think that QueryPerformanceInterface was the best way to go, but now Im getting doubts... thanks for any input.
::: Al:::
[Triple Buffer V2.0] - Resource leak imminent. Memory status: Fragile [edited by - IFooBar on January 23, 2003 6:17:32 PM]

Share this post


Link to post
Share on other sites
Advertisement
QueryPerformanceCounter could be implemented by rtdsc, but most probably isn''t. Because there''s some problems with that.

First of all, rtdsc is a counter that is increased for every processor tick. That''s all good, if the cpu speed was stable. But it isn''t... unfortunately

Some mobile computers change the cpu speed on the fly. Some motherboards allow you to change the cpu speed inside windows, so that''s a possibility to cheat.

I have heard that future motherboards will have a high performance timer built in, so that''s good news for game programmers and gamers

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
there is no such thing as QueryPerformanceInterface

Share this post


Link to post
Share on other sites
> It turns out that QueryPerformanceInterface is the slowest of the bunch, am I doing something wrong?
QueryPerformanceCounter uses either the PIT (1.193 MHz), or the PCI timer (3.580 MHz); both methods take several µs.

quote:
2: Does the asm take just 2 clock cycles to execute? Im not sure how to test how long the getTicks() function takes to execute.

~50 clocks on an Athlon (you need a serializing instruction such as cpuid to make sure the rdtsc instruction is executed when it should).

quote:
3: The value returned by getTicks() is different then the value returned by QueryPerformanceInterface. Shouldnt they be the same since they both get their data from the high performace timer?

Nope. You can get the perf. cnt. freq. with QueryPerformanceFrequency; the TSC is a clock counter.

quote:
4: How do I find out how many ticks there are in a second when using the getTicks() function?

t1 = getTicks();
/* wait e.g. 55 ms */
t2 = getTicks();
ticks_per_sec = (t2-t1) * (1000. / delay_ms)

quote:
1: Will it work on all x86 cpus? I guess it will only work on pentium or higher since it needs a 64 bit value? how about amd, and cyrix...?

quote:
5: Whats the compatibility status of thet getTicks() function? ie: is it safe to assuem that this function would work on all/most systems? if not then how do I go about this?


/* make sure CPUID is supported - either SEH, or toggle eflags.ID */
eax <- 1
cpuid
tsc_supported = edx[4]

Then again, it's a pretty safe assumption to make: it's been there 10 years now.


> Some mobile computers change the cpu speed on the fly. Some motherboards allow you to change the cpu speed inside windows, so that's a possibility to cheat. <
On top of that, the CPU clock jitters a bit, and you have problems on multiprocessor systems (CPU clocks may differ).


> I have heard that future motherboards will have a high performance timer built in, so that's good news for game programmers and gamers <
There is already one with a resolution of ~838 ns.


The lowdown:
Fastest timing method: read 10 ms counter at 0x7ffe000
Most accurate: rdtsc
easiest to use: GetTickCount

if(need high res)
if(single x86 desktop cpu)
rdtsc
else if(win only)
QPC
else if(pmode)
gettimeofday
else
PIT
else if(med. res)
GetTickCount / glutGet(GLUT_ELAPSED_TIME) / SDL_GetTicks


HTH
Jan

/* edit: fixed code tag */

500x1

[edited by - Jan Wassenberg on January 23, 2003 10:29:14 PM]

Share this post


Link to post
Share on other sites
I would expect the same source clock to drive all the oscillators on the mother board. Whatever the dynamic range of the PCI bus is, the CPU clock ought to be identical.

Cheap VXCO''s are good to 25ppm, though I''m not sure what the stability of the CPU clock is. I would expect it to be 25ppm or less.

I thought QueryPerformanceCounter was extremely slow, I got 5us in my timings as well.

You can consider just using rdtsc for your timings. I think it works on all pentium class machines and greater (though it may be pentium II or greater). Very few gamers are using pentiums or K6-2''s today, and fewer still are using 486''s (dare I say 0).

If you want actual seconds from rdtsc, then you time the CPU using something like getTick, and spin like mad waiting for it to change. Once it changes, snap the rdtsc, then sleep for a little while, say 100ms. Then spin like mad waiting for getTick to change again, and snap rdtsc. This way you know your timings started right at the millisecond switch-over, and not somewhere inbetween. Now you have a rdtsc difference for a number of milliseconds (calculate it from the values returned by getTick), and voila, an accurate timer.

Share this post


Link to post
Share on other sites
The test is flawed. It is basically an invalid supposition. That being that you would select the method of measuring time based upon the performance of the call. Two of those methods have such low resolution that to get to a +/- 1% accuracy in the measurement you could only call it ten times a second. If a timer has a resolution of 0.001 then your accuracy is only +/- 100% when measuring a 0.001 second interval. You could have taken a reading at 0.0001 seconds and again at 0.0019 or at 0.0009 and again at 0.0011 seconds. One is a 0.0018 second interval and the other is .0002, but both register as 0.001. If you read the value until it changes to get more or less in sync then keep reading until it changes again and measure that interval you will most likely find it varies. So you may find that it changes after 0.0015 seconds one time and 0.0005 the next. It is extremely accurate in that it may only be off by 1 second in a year which makes it only off by 1 in 31.5 million seconds, but it may not be nearly as precise on an individual reading.

The point of that is that if you are calling it no more frequently than 10 times a second then how long the call takes makes virtually no differance. What, perhaps a microsecond? So what would that be? 0.01% of your overall processing time? Who cares? That is one fundamental problem with the test. The test provides you with trivia of no practical value. You do not know any better how to select the right timer for the task you are attempting after the test than you did before.

One fundamental mistake you made is that two back to back calls of a timer tells you how long ONE call takes. The time is read in the middle of the routine. The start of the interval is that read as is the end of the interval. The start of the interval is not the time at which the caller started setting up for the call and the end the time at which the caller stores the return value. So if you want a routine to measure itself then it is how long it takes to make two back to back calls.

A more valuable test with QueryPerformanceCounter and your getTicks would be can they measure one another and themselves. rdtsc certainly can. Does back to back calls to QueryPerformanceCounter always return differant values? If so then for both of them how much does the differance vary? If not then how many times can you call QueryPerformanceCounter before it changes. Just use a sequence of query, query, rdtsc, rdtsc, query, rdtsc. Record the query to query intervals and the rdtsc to rdtsc intervals. Then repeat the test and record the results for each test.

I think that will do a lot more toward providing you knowledge that is actually useful. Once you have a baseline with QueryPerformanceCounter and rdtsc you can use them to measure the other two. You don''t need a loop. It doesn''t take longer to read the value after it changed than to read the same value twice. The loop is overhead just like the calls to the timers. With 10k iterations the loop is greater overhead than the calls to the timers. Overall I would say make fewer assumptions and take more measurements. You seem to have assumed the loop is trivial and the call to the timers isn''t. I could be wrong, but my guess is that if you actually measure them you will find it is the other way around.

Share this post


Link to post
Share on other sites
quote:
Does back to back calls to QueryPerformanceCounter always return differant values? If so then for both of them how much does the differance vary?


back to back calls of QueryPerformanceCounter returns different values with a difference of about ~5.

I tried doing something like

start = getTicks();
// Wait for a second
end = getTicks();

tickspersec = end - start;


but tickspersecond veries a lot at different times. So either Im not doing teh second count right, or rdtsc does have it''s problems. Im using GetTickCount() to check for a second. Since GetTickCount() returns in milisecs i guess thats ok right.

Ok and I got rid of all teh for loops and QueryPerformanceCounter still takes the longest (in clock ticks) to execute. (about 3000 clock ticks in debug mode) while GetTickCount() takes ~55 clock ticks, and timeGetTime() takes ~2000 clock ticks. So AFAIK GetTickCount() has the lease overhead, but is also the least accurate (~1ms accuracy). timeGetTime() is supposed to have a solid accuracy of 1ms, and QueryPerformanceCounter() depends on the system so you''d have to use QueryPerformanceFrequency() to find that out.

quote:

The lowdown:
Fastest timing method: read 10 ms counter at 0x7ffe000
Most accurate: rdtsc
easiest to use: GetTickCount



Ok now this is confusing me. I thought "the CPU clock jitters a bit" so that would make rdtsc not accurate. and what is the "10 ms counter"? Do you mean a 10 milisecond counter for some sort? if so then 10 ms is too big of a timing gap to use to measure function calls.

quote:
If you want actual seconds from rdtsc, then you time the CPU using something like getTick, and spin like mad waiting for it to change. Once it changes... , , , ...a number of milliseconds (calculate it from the values returned by getTick), and voila, an accurate timer.


do you mean I have to enter a loop and wait for getTicks() to report a 0 count? and then start timing for a few millisecs? what do u mean by ''snap rdtsc''? alf is confused people!

thanks for the replies, a few things are still uncertain though...


:::Al:::
[Triple Buffer V2.0] - Resource leak imminent. Memory status: Fragile

Share this post


Link to post
Share on other sites
Magmai Kai Holmlor:

> I would expect the same source clock to drive all the oscillators on the mother board
They did it that way on the original PC; I don't think that's the case anymore.

rdtsc is available on Pentiums and K6-2 as well.

His getTick reads the TSC.


IFooBar:

> but tickspersecond veries a lot at different times.
Task switching and improperly measuring the delay time will mess up your clock count.

quote:
Ok now this is confusing me. I thought "the CPU clock jitters a bit" so that would make rdtsc not accurate. and what is the "10 ms counter"?

The length of a clock may drift, but that'll be blown away by the high resolution, as long as you're timing stuff that takes >> 1 clock.
Windows maps the tick count into your app's mem at 0x7ffe0000 (note: in units of 10 ms). Have a look at the GetTickCount code.

quote:
do you mean I have to enter a loop and wait for getTicks() to report a 0 count? and then start timing for a few millisecs? what do u mean by 'snap rdtsc'? alf is confused people!

He means you should wait until the start of a (windows system clock) tick when getting your time reference; your call to GetTickCount might come just after / before the next tick, throwing you off by 10 ms worst case.

/* edit: typo */

[edited by - Jan Wassenberg on January 24, 2003 11:55:49 AM]

Share this post


Link to post
Share on other sites
It''s good to about 0.1%


  
#include <conio.h>
#include <windows.h>

#pragma warning(push)
#pragma warning(disable: 4035) //just kidding about the returning value

__forceinline __int64 cpu_ticks()
{
_asm{
cpuid;
rdtsc;
}
}
#pragma warning(pop)

double CPU_Speed_Hz()
{
DWORD dwInit_ms, dwStart_ms, dwEnd_ms;
volatile __int64 i64Start, i64Stop;
volatile __int64 i64Overhead;
i64Overhead = cpu_ticks() - cpu_ticks();
i64Overhead = cpu_ticks() - cpu_ticks();
i64Overhead = cpu_ticks() - cpu_ticks();

dwInit_ms = dwStart_ms = timeGetTime();
//Ensures we start on a fresh ms

while(dwInit_ms == dwStart_ms)
{
dwStart_ms = timeGetTime();
i64Start = cpu_ticks();
}

while((timeGetTime() - dwStart_ms)<1000)
{
Sleep(10);
}

dwInit_ms = dwEnd_ms = timeGetTime();
//Ensures we end on a fresh ms

while(dwInit_ms == dwEnd_ms)
{
dwEnd_ms = timeGetTime();
i64Stop = cpu_ticks();
}

__int64 i64Ticks = i64Stop - i64Start + i64Overhead;
double dElapsed_ms = dwEnd_ms - dwStart_ms;
double dSpeed_Hz = static_cast<double>(i64Ticks / dElapsed_ms / 1000.00);
return(dSpeed_Hz);
}

int main(int argc, char* argv[])
{
SetThreadPriority(GetCurrentThread() ,THREAD_PRIORITY_TIME_CRITICAL);
SetPriorityClass(GetCurrentProcess(), REALTIME_PRIORITY_CLASS);

double Speed;
double Sum=0;
const int iterations = 10;
for(int i=0; i<iterations;i++)
{
Speed = CPU_Speed_Hz();
Sum += Speed;
cout <<"CPU Speed: "<< Speed <<" MHz"<<endl;
}
cout << "Average: " << Sum/iterations << endl;
getch();
return 0;
}

Share this post


Link to post
Share on other sites

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!