Archived

This topic is now archived and is closed to further replies.

BSXrider

QueryPerformanceCounter: WOW !

Recommended Posts

I''ve been using gettickcount which in my current project would return a difference of 0ms 90% of the time my game looped, and 10ms the other 10%. QueryPerfrmanceCounter is SO much better. It''s frequency seems to be 3579545 times per second ?! - seb

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
It''s a CPU cycle counter, nothing else than the RDTSC command. The resolution is machine dependent. EG. on a 1.4Ghz machine, you''ll roghly get 1.4 billion ticks per second.

Share this post


Link to post
Share on other sites
Use QueryPerformanceFrequency to return the frequency. Take the result from QueryPerformanceCounter divided by what is returned from QueryPerformanceFreqency and you get the time in fractions of a second.

Share this post


Link to post
Share on other sites
One thing you should be aware of is that the higher resolution of using QueryPerformanceCounter comes at a cost: the function is considerably slower than many of the alternatives. Check out this paper on NVIDIA''s site, and try the program it comes with.

I''ve found that timeGetTime() works well as a compromise. It''s between GetTickCount and QueryPerformanceCounter performance-wise, and has a resolution of 1 ms, which is good enough for most purposes.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Just note that the AP who said QueryPerformanceCoutner stubs RDSTC is wrong, it does not. THis has been discussed on the DirectX mailing list and confirmed.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Of course it uses RDTSC. It does certainly many more things, that''s why it''s not as performance efficient as rdtsc. But the hardware timestamp counter is the only device in a PC capable of returning this kind of high resolution timing. So, at one moment or the other, QPC will call rdtsc. That''s why there was an MS confirmed issue on the QPC call, that would return bogus results on some very exotic CPU/chipsets combinations. It was a failure of the HW timestamp counter.

Share this post


Link to post
Share on other sites
So QueryPerformanceCounter doesn't work on 486's then? (I find that hard to believe).

QueryPerformanceCounter may use rtdsc in it's implementation, but it's doesn't have to. And it's certainly not the value that it returns.

It takes about 50us to execute (timed with rtdsc ) on my old PC (a 600MHz). Low enough overhead for once a frame, imo.

The problem with using rtdsc directly, is that you also need to _accurately determine the CPU speed. Which is a somewhat tricky thing to do - I always measure some amount of variance...


Edited by - Magmai Kai Holmlor on February 5, 2002 8:49:47 PM

Share this post


Link to post
Share on other sites
quote:

Of course it uses RDTSC. It does certainly many more things, that''s why it''s not as performance efficient as rdtsc. But the hardware timestamp counter is the only device in a PC capable of returning this kind of high resolution timing. So, at one moment or the other, QPC will call rdtsc. That''s why there was an MS confirmed issue on the QPC call, that would return bogus results on some very exotic CPU/chipsets combinations. It was a failure of the HW timestamp counter.



Not sure where you get your info from AP, but...


1) The "known issue" has nothing to do with the failure of the HW timestamp counter. The relevent knowledge base article:
http://support.microsoft.com/directory/article.asp?ID=KB;EN-US;Q274323


2) Executing a RDTSC is one thing QPC() could do, but definately *NOT* what it *always* does.

a. On multiprocessor systems, there is no guarantee that the RDTSC values are synchronised on all CPUs - the OS does not use RDTSC for QPC() on SMP systems where possible.

b. Laptops often use CPUs which *vary* the CPU speed (e.g. Intel SpeedStep). RDTSC counts CPU cycles - when the CPU speed is varying, RDTSC *cannot* be used to accurately measure time. In this case, once again, the OS does not use RDTSC for QPC() where possible.

c. If there is a *hardware* timer of high enough resolution available (not CPU!), QPC() will use that.

d. The implementation differs between the NT kernel and the 9x kernel, and service pack version, but the above has been stated by different people from Microsoft in various online forums.


--
Simon O''''Connor
Creative Asylum Ltd
www.creative-asylum.com

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
OK, lets state it like that: QPC uses rdtsc 99% of the time on standard user systems. And yes, it has obviously a fallback option for non rdtsc capable systems. But on Win9x, Win2k uni-CPU, WinNT uni-CPU kernel it uses rdtsc. I disassembled code using QPC on various Windows versions, because of a very weird effect we had with it. I don''t know about WinXP though, but I highly suspect it to use rdtsc as well.

I''m not aware of any other hardware timer that comes *anywhere* near the accuracy of rdtsc.

> It takes about 50us to execute (timed with rtdsc) on my old PC (a 600MHz). Low enough overhead for once a frame, imo.

This is enormeous. Didn''t know, that QPC was so slow.

Share this post


Link to post
Share on other sites
It seems like many people are ignoring (or maybe just arguing other points now) Myopic Rhino''s post about the nVidia study (source+exe included) on this exact topic.

I think it is a great example that runs many tests on each option and allows you to see for yourself which is fastest.

I know this article has made me decide which one I will use in future projects.

Cray

Share this post


Link to post
Share on other sites
quote:
Original post by Anonymous Poster
This is enormeous. Didn't know, that QPC was so slow.

IIRC, it's a kernel-mode switch&back, what were you expecting?
You definetly wouldn't want to pepper your code with them, but once a frame for an accurate and reliable elapsed time measurement seems acceptable to me.

The timer program gave me about 800ns on my new computer (it's about 3 times faster though).


Edited by - Magmai Kai Holmlor on February 6, 2002 3:34:12 AM

Share this post


Link to post
Share on other sites
The performance counter rules...timeGetTime() may be fast but the values it produces can not be trusted...
The times returned are incorrect making your animations jumpy..

I logged values in my gameloop beacause of these problems and I noticed this:

10 ms
10 ms
10 ms
20 ms
10 ms
10 ms
10 ms
20 ms
10 ms
...
...
...

At the end the time is correct but inbetween the times jump around....Damn my english...Can''t explain it better...

QPC always returns roughly the same amount...

Share this post


Link to post
Share on other sites
Surely just calling QPC() on two consecuitive lines of code should give you a reasonable guage of how long it takes to execute?

Worked out to 1.7 microseconds on mine (the difference returned was typically 6 with a frequency of 3.5 million).

- seb

Share this post


Link to post
Share on other sites
The frequency varies (sometimes strongly) on different machines. The performance counter is SUPPOSED to run at around 3.19 Mhz. I can imagine the frequency would be higher on faster machines, but it doesn''t accuratly represent your clock speed.

----------------------------
"Now comes the mystery."
Last words of Henry Ward Beecher (I have no idea who he was, but it''s a cool quote)

!!! WARNING INCOMING EGO-TRIP !!!
Check out Asteroidz, it rules !!!

Share this post


Link to post
Share on other sites
quote:
Original post by BSXrider
What you're trying to say granat is that it doesn't have a high enough resolution.

That timer program didn't run for me!

- seb


I guess so...

Even though it returns ms as return value it does not appear to have a 1 ms resolution.....


EDDIE MURPHY: "MERRY NEW YEAR !"
BAD GUY: "IN THIS COUNTRY WE SAY HAPPY NEW YEAR !!!"
EDDIE MURPHY: "THANK YOU FOR CORRECTING MY ENGLISH !"
(Trading Places)




Edited by - granat on February 6, 2002 8:25:48 AM

Share this post


Link to post
Share on other sites
My machine (single processor, ACPI, Win2K SP2) does not use rdtsc for QueryPerformanceCounter. It uses a 1.1927 MHz clock instead.

One reason for this is probably that QueryPerformanceFrequency''s result is defined as being constant whilst a system remains booted.

Given that machines exist where this is not the case, then some counter other than that in the processor must be used.

Share this post


Link to post
Share on other sites
You can increase the resolution of timeGetTime() using timeBeginPeriod()/timeEndPeriod(). I''ve successfully changed timeGetTime()''s resolution to 1ms on my Win2K system. You only need to do this on NT-based systems, since on 9X it has a default resolution of 1ms already. You never really need QPC() unless you''re timing code...

This stuff is in the documentation.


--
Eric

Share this post


Link to post
Share on other sites
Just a side not on timeBeginPeriod/timeEndPeriod: the MSDN docs state you should call timeBeginPeriod "immediately before using timer services" and class timeEndPeriod "immediately after you are finished using the timer services".

I assume they mean you shouldn''t call timeBeginPeriod/timeEndPeriod at the start and end of your program, respectively, but rather just before starting timeGetTime and after the ending timeGetTime. Keep this in mind if you are reading NVIDIA''s document on function times, because it doesn''t take into consideration the time it takes to call the timeBeginPeriod/timeEndPeriod pair.


- Houdini


Share this post


Link to post
Share on other sites
quote:
Original post by ekenslow
You can increase the resolution of timeGetTime() using timeBeginPeriod()/timeEndPeriod(). I''ve successfully changed timeGetTime()''s resolution to 1ms on my Win2K system. You only need to do this on NT-based systems, since on 9X it has a default resolution of 1ms already. You never really need QPC() unless you''re timing code...

This stuff is in the documentation.


--
Eric


Well my current project pulls 1000+ fps so 1ms accuracy is definately not enough

And given that the timer, whichever one you choose is only called once a loop it seems to me anyone would be silly not to use the best you can get. My computer appeared to take 1.7 us to make a QPC call. At 100 fps that''s 0.017 % of your available processing time.

- seb

Share this post


Link to post
Share on other sites
quote:
Original post by DrPizza
One reason for this is probably that QueryPerformanceFrequency''s result is defined as being constant whilst a system remains booted.


I''m almost certain that changed between NT4 and 2k, and that this is no longer the case with 2k nor XP.

timeBeginPeriod/timeEndPeriod affect context switching time - i.e. if you call timeBeginPeriod(1) you thread-time slices will be 1ms long, not 10ms (or 16ms seems to be the popular default on Compaqs).

Share this post


Link to post
Share on other sites
Trying to maintain 1000 fps is just silly. If your game runs at 1000 fps then- at best- for every 10 frames it draws, 1 is displayed to the user. That''s being generous- below 100Hz refresh rate you''ll draw and discard even more frames for every one actually displayed.

If you want to use QueryPerformanceCounter() because it makes you feel good, then go right ahead- it''s your code. But don''t try to say that timeGetTime() at 1ms resolution is too slow for your framerate, because that''s BS.


--
Eric

Share this post


Link to post
Share on other sites