Archived

This topic is now archived and is closed to further replies.

Best, most accurate clock?

This topic is 5000 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Ok, ive done a little bit of digging for timing in c++, and a report i read said that the multimedia clock (it must be in mmsystem.h ) is the most accurate, does anyone have any input? its not as if i need super accurate timing for my crappy 2d engine... but i was hoping to create a class and save it for later, once im better at d3d and c++ -Dan Yes I realize im a n00b... Don't yell. -fel [edited by - felisandria on December 12, 2003 2:22:23 PM]

Share this post


Link to post
Share on other sites
they are probably refering to QueryPerformanceCounter, (although i dont think that needs winmm, maybe timeGetTime).

Regardless, using an intel system:
__asm {
cpuid ; force all previous instructions to complete
rdtsc ; read time stamp counter
mov time, eax ; move counter into variable
}
Should give you a reading as accurate as possible. It is sometimes advised to do this reading twice, and you will need to look up your processors manual to see how many clock ticks the rdtsc instruction takes (about 30 from memory)

Share this post


Link to post
Share on other sites
quote:
Original post by aboeing
they are probably refering to QueryPerformanceCounter, (although i dont think that needs winmm, maybe timeGetTime).

Regardless, using an intel system:
__asm {
cpuid ; force all previous instructions to complete
rdtsc ; read time stamp counter
mov time, eax ; move counter into variable
}
Should give you a reading as accurate as possible. It is sometimes advised to do this reading twice, and you will need to look up your processors manual to see how many clock ticks the rdtsc instruction takes (about 30 from memory)


Shouldn''t matter how many it takes.. because if it takes 30, it will ALWAYS be behind by 30, therefore your program will see the start time 30 ticks after it really is, etc... so it will always give the same time change value!

if it goes from 0 -> 600... or 30 -> 630 if you add the 30 in.. the time difference is still identical, so no need to compensate!

Share this post


Link to post
Share on other sites
If the counter returns the timestamp BEFORE it was executed, you would have the overhead of this instruction before executing your code. If the counter returns the timestamp AFTER it's been executed (sounds strange), you would have overhead when reading the time after your code has executed..
Either way you have the overhead of 1 timestamp read...

-- EDIT --
This is if you're using it for performance measurement obviously (wouldn't reckon you'd use the timestamp counter for anything else..)

[edited by - BiTwhise on December 12, 2003 10:18:18 AM]

Share this post


Link to post
Share on other sites
timeGetTime is probably fine for your needs, but it''s not the best. It has a resolution of 1 ms on 9x platforms, and also on NT if you call timeBeginPeriod; note that doing so increases system load. The problem is, it''s basically just a tick counter (incremented every timer interrupt, as reported by GetSystemTimeAdjustment), and drifts over time (the interrupt isn''t always delivered on time) - this is bad if you''re relying on this clock for multiplayer.

QueryPerformanceCounter has better resolution (depending on HAL, ~1 µs if using PIT, ~.3 µs for the PM timer, better than 100 ns if HPET is available, or CPU clock period if using TSC), but a raft of problems:
[comments from timer source]
// problems:
// - multiprocessor systems: may be inconsistent across CPUs;
// setting thread affinity is too much work.
// - if implemented with TSC: same problems as above.
// (we check if TSC/QPC freqs differ by more than 10% -
// can''t assume PIT / PMT freq values, because new the new HPET
// timer''s frequency is unspecified)
// - Q274323: jumps several seconds under heavy PCI bus load.
// readings are checked against system time and discarded if invalid.
// - "System clock problem can inflate benchmark scores":
// invalid value if not polled every 4.5 seconds? solved
// by calibration thread, which reads timer every second anyway.

While we''re at it, problems with rdtsc / the CPU timestamp counter:
// problems:
// - multiprocessor systems: may be inconsistent across CPUs;
// setting thread affinity is too much work.
// - deep sleep modes: TSC may not be advanced.
// - SpeedStep/''gearshift'' CPUs: frequency may change.
// this happens on notebooks now, but eventually desktop systems
// will do this as well (if not to save power, for heat reasons).
//
// detect''s tsc_is_safe check tries to ensure the above won''t happen,
// but I don''t think it''s possible to determine beforehand if the CPU
// does SpeedStep (currently, it assumes so iff running on a laptop).
// to be safe, wsdl disables the TSC when an APM message is received.

What I do (to emulate gettimeofday on Windows; Linux does something similar already) is choose one of 3 high-resolution timers (QPC, TSC, timeGetTime), depending on which has the least problems, and lock it to the system time. Still working on it, but looks good already.

aboeing:
In addition to the above problems, be sure to issue several cpuid before first use of the timer - the first few times take longer to execute. BTW, the CPU clock isn''t terribly accurate.

Share this post


Link to post
Share on other sites
I think the standard C function clock works well, to use it I include:
#include <ctime>
#include <cstdlib>

In school I program in Windows and submit my homework on NetBSD, and one of them doesn''t have CLOCKS_PER_SEC defined (not sure which constant is the standard), so I keep this up top:

#ifndef CLOCKS_PER_SEC
#define CLOCKS_PER_SEC CLK_TCK
#endif

to time something:

std::clock_t end_time, start_time;
start_time = std::clock();
//do something here
end_time = std::clock();
double elapsed_time_in_clock_ticks = (double)(end_time - start_time);
double elapsed_time_in_seconds = elapsed_time_in_clock_ticks / (double)CLOCKS_PER_SEC;


And unlike those windows functions, if you ever want to port your code, this code is platform independent :-)

Share this post


Link to post
Share on other sites
Unfortunately, clock() only has a resolution of 10..15 ms on Windows (it''s implemented with GetSystemTimeAsFileTime).

> And unlike those windows functions, if you ever want to port your code, this code is platform independent
Yes - which is why I emulate gettimeofday and clock_gettime on Windows, and use those in real code

Share this post


Link to post
Share on other sites
quote:
Original post by Ready4Dis
if it goes from 0 -> 600... or 30 -> 630 if you add the 30 in.. the time difference is still identical, so no need to compensate!

Yes, but thats only because I''m an idiot.
I really should stop posting late at night.
quote:

be sure to issue several cpuid before first use of the timer - the first few times take longer to execute.
BTW, the CPU clock isn''t terribly accurate.


Yeah, this is basically what AMD suggested to do to count ticks. What other possible method will give you a better reading?
I thought rdtsc was the most accurate reading possible, the only reason you get inaccuracies is because of multitasking?

Share this post


Link to post
Share on other sites
read the cedar.intel.com documentatino for rdstc.


Second poster was correct.

What they do is to time their own function after three warm up passes.


RDTSC WITH CPUID is about 20 something clock ticks.

Just subtract it from your time differences if that''s what you are after (as in a timer). Same as any other method for overhead.


As a note: QueryPerformanceCounter is about 674 clock ticks.

That''s for a straight
//start
QueryPerformanceCounter(&starttime);
QueryPerformanceCounter(&endtime);
totaltime=endtime-starttime;
//end


For overhead, use declspec(naked) if you know what you are doing. You won''t get any extra overhead that way.


Btw Dan, don''t geel like a "n00b" because you asked for help on a topic like this. You did your research, and it''s one of the most reasonably debated topics around here. Also a pretty important one for some applications.

It kind of comes down to preference. I still like using QueryPerformanceCounter for things like frame rate and testing large blocks of code as it saves the inline assembly. For smaller peices (testing which is faster etc for release modes) and optimisation I use the RDTSC code.

Oh, as a side note: CPUID isn''t really necessary for the newer processors, but use it anyway unless that extra 15 or so clock ticks is that important to you. Better safe than sorry.

Share this post


Link to post
Share on other sites
> I thought rdtsc was the most accurate reading possible, the only reason you get inaccuracies is because of multitasking?
I''m saying it''s not as accurate as it would appear - the crystal that drives the CPU clock isn''t the best. The TSC is the best you''ve got for measuring short intervals, but you''re screwed if the processor decides to sleep, or change frequency. I think the OP wanted a wall clock reference - in that case, using the TSC is a lot of work

> As a note: QueryPerformanceCounter is about 674 clock ticks.
Wow, that''s fast - sounds like a switch to kernel mode, rdtsc, a little bit correction, and that''s it. The other QPC implementations (PIT, PMT, HPET) require port I/O, so it ends up taking several µs. What OS and HAL is this on?

Share this post


Link to post
Share on other sites
Is it safe to assume that any x86 cpu with MMX or newer has the CPUID / RDTSC functionality built in? Intel, AMD and all the others alike?

Don''t we all still remember and miss the day we plotted our first pixel?

Share this post


Link to post
Share on other sites
Yes, but why should you? Just issue CPUID, get the feature bits, and test if TSC is set. Side note: I believe late 486 models supported CPUID and possibly the TSC.

; make sure CPUID is supported (size opt.)
pushfd
or byte ptr [esp+2], 32
popfd
pushfd
pop eax
shr eax, 22 ; bit 21 toggled?
jnc no_cpuid

; get vendor string
xor eax, eax
cpuid
mov edi, offset cpu_vendor
xchg eax, ebx
stosd
xchg eax, edx
stosd
xchg eax, ecx
stosd
; (already 0 terminated)

; get CPU signature and std feature bits
push 1
pop eax
cpuid
mov [cpu_caps], edx

Share this post


Link to post
Share on other sites
quote:
Original post by Jan Wassenberg
> I thought rdtsc was the most accurate reading possible, the only reason you get inaccuracies is because of multitasking?
I''m saying it''s not as accurate as it would appear - the crystal that drives the CPU clock isn''t the best. The TSC is the best you''ve got for measuring short intervals, but you''re screwed if the processor decides to sleep, or change frequency. I think the OP wanted a wall clock reference - in that case, using the TSC is a lot of work

> As a note: QueryPerformanceCounter is about 674 clock ticks.
Wow, that''s fast - sounds like a switch to kernel mode, rdtsc, a little bit correction, and that''s it. The other QPC implementations (PIT, PMT, HPET) require port I/O, so it ends up taking several µs. What OS and HAL is this on?




first point: That''s why you call CPUID for smaller measurements - it forces the CPU to flush everything, like calling glflush before a glfinish call
You should ALSO have your timed function set to the highest thread performance and priority to stop anything multitasking over it. Makes winamp lag if your timing 200ms functions though.


second point:

674 clock ticks is pretty slow for a high performance counter

Considering RDTSC is 20 ticks, and can measure to (scientifically speaking - if you''ve studied it you will know what I mean, if you haven''t then ignore it) half a clock tick (ie; it can differentiate), compared to the queryperformancecounter() which counts to the nearest 300 clock ticks and calls it 1, and takes so long, it isn''t the best counter. It is also very dependant on anything else in the system. RDTSC (when used correctly) is immune to them.


Oh, regarding your setup question, that was running in the highest priority and threading on using only one CPU on the motherboard.
CPU was a Athlon 1ghz.
couple of external meg of cache (forgot the speed)
512kb internal
50ms 768mb ram on board
WinXP pro


It doesn''t matter anyway though, as on a single chip it will be the same no matter the setup using the timing functions that were written to test it.

Share this post


Link to post
Share on other sites
quote:
Original post by Bad_Maniac
Is it safe to assume that any x86 cpu with MMX or newer has the CPUID / RDTSC functionality built in? Intel, AMD and all the others alike?

Don''t we all still remember and miss the day we plotted our first pixel?


and the following posters 486 comment.

CPUID I think was on the very late 486s as second poster said.
RDTSC is only on the Pentium and above class chips.
I am NOT sure if they were on (I beleive they were as they are supposed to share instruction sets) the Cyrix and AMD 80586 and 80686 chips.

CPUID is used on the first two runs of the pentiums to make sure the pipelines don''t intermingle etc. Keeps them inline. Not needed on the later chips, but it only takes a couple of clock ticks extra and is safer that way if you change machines to an earlier model.

Share this post


Link to post
Share on other sites
quote:
Original post by Jan Wassenberg
Yes, but why should you? Just issue CPUID, get the feature bits, and test if TSC is set. Side note: I believe late 486 models supported CPUID and possibly the TSC.



Because I''m a lazy bastard, and I already check for the presence of MMX anyways, since my software blitter uses it. I just thought if I can then just skip checking if it has the RDTSC present.

But, to be on the safe side since you never know with Cyrix and other odd CPU''s, What register, and what bit returned from the CPUID is set if RDTSC is present?


Share this post


Link to post
Share on other sites
Don''t rely on (just) RDTSC for any long term timing or you''ll be screwed on systems that have SpeedStep or other similar technologies that can alter the CPU''s clock frequency at whim. You''ll wind up with code that works well on most desktops but acts insane on many laptops.

Share this post


Link to post
Share on other sites
I''ve asked questions along this line before, but here''s a new one:

I''m confused as to why there''s such a big problem with making accurate and consistent timers available to programmers.

There''s a crystal-managed hardware timer on all (?) motherboards that provides approximately nanosecond accuracy, and a means for the hardware to read the state of the timer whenever it desires to. It also runs with the power turned off.

What''s the big issue in making a driver/system call that basically blurts the entire value in the hardware clock (nanosecond fraction and seconds since start time (1970?)) to the calling function, and does it quick enough that it can be done thousands of times a second with no performance loss?.

Is it that requesting the clock data from the motherboard is difficult / slow? That the resolution is variable and not accessible? I just don''t get it.

Share this post


Link to post
Share on other sites
quote:
first point: That''s why you call CPUID for smaller measurements - it forces the CPU to flush everything, like calling glflush before a glfinish call

Right, but how does that relate to what I said?

quote:
You should ALSO have your timed function set to the highest thread performance and priority to stop anything multitasking over it. Makes winamp lag if your timing 200ms functions though.

Yes, that''s a good idea in general when timing stuff, but I don''t think you understood what I was trying to say. Apart from task-switching problems (which bite you no matter what timer you''re using), the TSC isn''t as accurate as the amazing resolution suggests, due to the poor crystal quality. Also, you have to watch out for APM sleep/frequency change.


Can you explain why RDTSC can differentiate half a clock tick? I apparently haven''t studied it, but I''m interested.

> 674 clock ticks is pretty slow for a high performance counter
> .. queryperformancecounter() which counts to the nearest 300 clock ticks and calls it 1
Given your 1 GHz CPU clock, it sounds like your QPC uses the PM timer (3.57 MHz). Even though it''s only 1 register to read (vs. PIT, which requires a latch command + 2 8-bit reads), I''m surprised it only takes .6 µs.

> It is also very dependant on anything else in the system. RDTSC (when used correctly) is immune to them.
Are you referring to the ''PMT jumps sometimes with heavy bus traffic'' issue? I would have said it the other way around: with the TSC, you can only hope the CPU doesn''t pull the rug out from under you.

quote:
I am NOT sure if they were on (I beleive they were as they are supposed to share instruction sets) the Cyrix and AMD 80586 and 80686 chips.

I used to run a K6-III, and I''m pretty sure it had rdtsc.


Bad Maniac:
enum
{
TSC = BIT(4),
..
MMX = BIT(23),
Those are the standard feature flags. But c''mon, that''s what the manual is for


Krylloan:
good question, ask Microsoft

quote:
There''s a crystal-managed hardware timer on all (?) motherboards that provides approximately nanosecond accuracy, and a means for the hardware to read the state of the timer whenever it desires to. It also runs with the power turned off.

Are you sure? I know of the RTC, which is battery-backed, but clocked at 32 KHz. IIRC, it can only be used to generate periodic interrupts anyway, not to return a timestamp. The PIT and PMT are 1.19 and 3.57 MHz respectively; I''m not sure if they run in deep sleep. The new HPET Microsoft was begging for (irony: they don''t use it yet as far as I can tell, but Linux does) is at least 10 MHz, and *accurate* to 1 ns - is that the one you mean? Unfortunately, it''s not required to run in power-saving mode. *sigh*

quote:
What''s the big issue in making a driver/system call that basically blurts the entire value in the hardware clock (nanosecond fraction and seconds since start time (1970?)) to the calling function, and does it quick enough that it can be done thousands of times a second with no performance loss?.

That would be nice Since none of the timers are non-volatile, the OS would have to help, either 1) adding the date to the timer at startup or whenever it changes, or 2) using the timestamp to enhance its own timekeeping. The PIT is quite slow to read, and PMT has this annoying jump issue, so that leaves the HPET. Bonus: it''s memory mapped, so no port I/O slowdown.
2) can be done by yourself (I do, and it ended up 600 lines), but this really should be done by the OS (which has access to the PIT, and more knowledge of the system/APM events, etc.).

> Is it that requesting the clock data from the motherboard is difficult / slow?
> That the resolution is variable and not accessible? I just don''t get it.
It currently requires port I/O, but even that isn''t horribly slow (ok, a few µs). Resolution and frequency are known.
I don''t get it either - it''s a #)(*&% disgrace. The one high resolution timer API we have is even crippled on multiprocessor systems, because it''s documented to sometimes fail to keep results consistent between CPUs. FFS! Linux manages somehow.. (gettimeofday)

Share this post


Link to post
Share on other sites
the first point thingy was regarding your point but as a clarification, not prooving you wrong.

the differentiation to half a clock tick is an engineering and science thing.


basically, if you can see a scale that has increments of lets say 0, 1, 2, 3, 4, ..., n

you can safely differentiate to either 0, 0.5, 1, 1.5,...,n and it is assumed correct, as long as you know that it is inbetween 0 and 1 and roughly mid way it is technically correct to say its 0.5. It ISN''T technically correct to assume its 0.33, 0.66, or 0.25 or 0.5 unless there is an accurate way to measure it. Even if you can see its spot on, it isn''t considered accurate.

Same goes for timers.


Lets say if you have

int a;
unsigned hyper clocktime;
startclock(clocktime);
a<1024;
a<1024;
endclock(clocktime);
Edit1->Text=AnsiString(clocktime);

it will return 2;


BUT
int a;
unsigned hyper clocktime;
startclock(clocktime);
a^1024;
a^1024;
endclock(clocktime);
Edit1->Text=AnsiString(clocktime);

will return 0;


Since nothing can take 0 time, you could say its 0.5. I just say its close to 0, but you can only say that a^1024 is either 0 or 0.5.


Bascially that''s how you measure anything anywhere to technical specifications, ISO for instance. You can only measure to HALF the smallest increment.

In reality the above examples are a<1024 is one clock tick (very close to 1) and a^1024 is so close to 0 that if you had

startclock(clocktime);
for(register int a;a<1024;a++);
endclock(clocktime);
Edit1->Text=AnsiString(clocktime);


and


startclock(clocktime);
for(register int a;a^1024;a++);
endclock(clocktime);
Edit1->Text=AnsiString(clocktime);

the second one will be about 1000 to 1500 clock ticks faster.

Oh, that''s one of those spiffy bit wise operations that can really help performance.
In the above loop, that''s roughly a 20% increase in performance will full optimisation.



Anyway, if you do engineering or science you get the hang of this measurement stuff. I try and get averages to get a more closer result, but from a single measurement you only get (in the case of this timer) 0 or 1, and if you can take a good guess, 0.5.

That''s how you measure stuff more accurately than you normally would, without getting fired

Oh, don''t go and measure things like car numbers and say "I have 55.5 cars in this ship". Some things are just kind of stupid

Share this post


Link to post
Share on other sites
thought of a better example



startclock(&clock);
func1();
func1();
func1();
func1();
func1();
endclock(&clock);
Edit1->Text=AnsiString(clock);

returns 5




startclock(&clock);
func2();
func2();
func2();
func2();
func2();
endclock(&clock);
Edit1->Text=AnsiString(clock);

returns 2






startclock(&clock);
func3();
func3();
func3();
func3();
func3();
endclock(&clock);
Edit1->Text=AnsiString(clock);

returns 1




example a: safe to assume that func1() takes 1 clock tick per call
example b: safe to assume that func1() takes 0.5 clock ticks per call
example c: safe to assume that func1() takes 0 clock ticks per call

That''s just averages and then rounded to the nearest half an increment, that being 0, 0.5 and 1.


Hope you get the drift.

Happy programming.

Share this post


Link to post
Share on other sites
hmm, ok. I''m still not convinced about your magical ability to be confident in measurements that yield .5, and not (say) .33.
In this case, the uncertainty of where in the tick you start/stop kills you (may actually have been +/- 2 ticks) - no statement can be made about the times, due to insufficient resolution. If we increase the # of samples to avoid this problem and get 3000 calls in 1000 ticks, I''ll be damned if I''ll call that 0.5 ticks per call

Share this post


Link to post
Share on other sites
I think what he''s trying to show is that 3000 calls / 1000 ms is more 0.5 ticks per call than what it is 1 tick. Why on earth you''d like sub clock tick timing, when it''s hard to find a timer that''s even remotely that accurate is beyond me though :D

Share this post


Link to post
Share on other sites


Something like this can be tested easily

rdtsctimer();
x<y;
x<y;
x<y;
x<y;
rdtsctimer();
outputresult();

rdtsctimer();
x^y;
x^y;
x^y;
x^y;
rdtsctimer();
outputresult();




Big difference. 4clock ticks. One of those times you can assume that x^y is less than half a clock tick (even four of them is less than half a clock tick).

As I said, if you haven't spent a few years studying it, ignore it. If you have studied it you will know why you use it.

for short intervals rdtsc is dead accurate, long time measurements it becomes less accurate.
queryperformance is the other way around.

edit: < to &lt;

[edited by - Dredge-Master on December 17, 2003 9:09:27 PM]

Share this post


Link to post
Share on other sites
quote:
Original post by Dredge-Master
rdtsctimer();
x<y;
x<y;
x<y;
x<y;
rdtsctimer();
outputresult();

rdtsctimer();
x^y;
x^y;
x^y;
x^y;
rdtsctimer();
outputresult();



if those are built in types wouldn''t they get optimised away as they have no side effects and the return value isn''t used.

quote:
As I said, if you haven''t spent a few years studying it, ignore it. If you have studied it you will know why you use it


With your experience, wouldn''t you normally profile something a large number of times, rather than just once?

Share this post


Link to post
Share on other sites
quote:
I think what he''s trying to show is that 3000 calls / 1000 ms is more 0.5 ticks per call than what it is 1 tick. Why on earth you''d like sub clock tick timing, when it''s hard to find a timer that''s even remotely that accurate is beyond me though :D

If that''s all he''s saying, I agree
CPU clock accuracy is not the issue: it may be bad (cheap crystals have something like 200 PPM freq tolerance), but the ''benchmark'' is still counting clocks, no matter how long they are.

A more important question: how is an instruction going to take less than a clock?!
> even four of them [xor] is less than half a clock tick.
oh come on

quote:
for short intervals rdtsc is dead accurate, long time measurements it becomes less accurate.
queryperformance is the other way around.

What is your definition of accurate?

> With your experience, wouldn''t you normally profile something a large number of times, rather than just once?
I guess in this case it''s alright - with such a short piece of code, you will quickly notice if you got preempted (no more 4 clock time difference). The only other change would be warming the cache, and that''s not an issue here either.

Share this post


Link to post
Share on other sites