Jump to content
  • Advertisement

Archived

This topic is now archived and is closed to further replies.

Ademan555

Best, most accurate clock?

This topic is 5364 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Is it safe to assume that any x86 cpu with MMX or newer has the CPUID / RDTSC functionality built in? Intel, AMD and all the others alike?

Don''t we all still remember and miss the day we plotted our first pixel?

Share this post


Link to post
Share on other sites
Advertisement
Yes, but why should you? Just issue CPUID, get the feature bits, and test if TSC is set. Side note: I believe late 486 models supported CPUID and possibly the TSC.

; make sure CPUID is supported (size opt.)
pushfd
or byte ptr [esp+2], 32
popfd
pushfd
pop eax
shr eax, 22 ; bit 21 toggled?
jnc no_cpuid

; get vendor string
xor eax, eax
cpuid
mov edi, offset cpu_vendor
xchg eax, ebx
stosd
xchg eax, edx
stosd
xchg eax, ecx
stosd
; (already 0 terminated)

; get CPU signature and std feature bits
push 1
pop eax
cpuid
mov [cpu_caps], edx

Share this post


Link to post
Share on other sites
quote:
Original post by Jan Wassenberg
> I thought rdtsc was the most accurate reading possible, the only reason you get inaccuracies is because of multitasking?
I''m saying it''s not as accurate as it would appear - the crystal that drives the CPU clock isn''t the best. The TSC is the best you''ve got for measuring short intervals, but you''re screwed if the processor decides to sleep, or change frequency. I think the OP wanted a wall clock reference - in that case, using the TSC is a lot of work

> As a note: QueryPerformanceCounter is about 674 clock ticks.
Wow, that''s fast - sounds like a switch to kernel mode, rdtsc, a little bit correction, and that''s it. The other QPC implementations (PIT, PMT, HPET) require port I/O, so it ends up taking several µs. What OS and HAL is this on?




first point: That''s why you call CPUID for smaller measurements - it forces the CPU to flush everything, like calling glflush before a glfinish call
You should ALSO have your timed function set to the highest thread performance and priority to stop anything multitasking over it. Makes winamp lag if your timing 200ms functions though.


second point:

674 clock ticks is pretty slow for a high performance counter

Considering RDTSC is 20 ticks, and can measure to (scientifically speaking - if you''ve studied it you will know what I mean, if you haven''t then ignore it) half a clock tick (ie; it can differentiate), compared to the queryperformancecounter() which counts to the nearest 300 clock ticks and calls it 1, and takes so long, it isn''t the best counter. It is also very dependant on anything else in the system. RDTSC (when used correctly) is immune to them.


Oh, regarding your setup question, that was running in the highest priority and threading on using only one CPU on the motherboard.
CPU was a Athlon 1ghz.
couple of external meg of cache (forgot the speed)
512kb internal
50ms 768mb ram on board
WinXP pro


It doesn''t matter anyway though, as on a single chip it will be the same no matter the setup using the timing functions that were written to test it.

Share this post


Link to post
Share on other sites
quote:
Original post by Bad_Maniac
Is it safe to assume that any x86 cpu with MMX or newer has the CPUID / RDTSC functionality built in? Intel, AMD and all the others alike?

Don''t we all still remember and miss the day we plotted our first pixel?


and the following posters 486 comment.

CPUID I think was on the very late 486s as second poster said.
RDTSC is only on the Pentium and above class chips.
I am NOT sure if they were on (I beleive they were as they are supposed to share instruction sets) the Cyrix and AMD 80586 and 80686 chips.

CPUID is used on the first two runs of the pentiums to make sure the pipelines don''t intermingle etc. Keeps them inline. Not needed on the later chips, but it only takes a couple of clock ticks extra and is safer that way if you change machines to an earlier model.

Share this post


Link to post
Share on other sites
quote:
Original post by Jan Wassenberg
Yes, but why should you? Just issue CPUID, get the feature bits, and test if TSC is set. Side note: I believe late 486 models supported CPUID and possibly the TSC.



Because I''m a lazy bastard, and I already check for the presence of MMX anyways, since my software blitter uses it. I just thought if I can then just skip checking if it has the RDTSC present.

But, to be on the safe side since you never know with Cyrix and other odd CPU''s, What register, and what bit returned from the CPUID is set if RDTSC is present?


Share this post


Link to post
Share on other sites
Don''t rely on (just) RDTSC for any long term timing or you''ll be screwed on systems that have SpeedStep or other similar technologies that can alter the CPU''s clock frequency at whim. You''ll wind up with code that works well on most desktops but acts insane on many laptops.

Share this post


Link to post
Share on other sites
I''ve asked questions along this line before, but here''s a new one:

I''m confused as to why there''s such a big problem with making accurate and consistent timers available to programmers.

There''s a crystal-managed hardware timer on all (?) motherboards that provides approximately nanosecond accuracy, and a means for the hardware to read the state of the timer whenever it desires to. It also runs with the power turned off.

What''s the big issue in making a driver/system call that basically blurts the entire value in the hardware clock (nanosecond fraction and seconds since start time (1970?)) to the calling function, and does it quick enough that it can be done thousands of times a second with no performance loss?.

Is it that requesting the clock data from the motherboard is difficult / slow? That the resolution is variable and not accessible? I just don''t get it.

Share this post


Link to post
Share on other sites
quote:
first point: That''s why you call CPUID for smaller measurements - it forces the CPU to flush everything, like calling glflush before a glfinish call

Right, but how does that relate to what I said?

quote:
You should ALSO have your timed function set to the highest thread performance and priority to stop anything multitasking over it. Makes winamp lag if your timing 200ms functions though.

Yes, that''s a good idea in general when timing stuff, but I don''t think you understood what I was trying to say. Apart from task-switching problems (which bite you no matter what timer you''re using), the TSC isn''t as accurate as the amazing resolution suggests, due to the poor crystal quality. Also, you have to watch out for APM sleep/frequency change.


Can you explain why RDTSC can differentiate half a clock tick? I apparently haven''t studied it, but I''m interested.

> 674 clock ticks is pretty slow for a high performance counter
> .. queryperformancecounter() which counts to the nearest 300 clock ticks and calls it 1
Given your 1 GHz CPU clock, it sounds like your QPC uses the PM timer (3.57 MHz). Even though it''s only 1 register to read (vs. PIT, which requires a latch command + 2 8-bit reads), I''m surprised it only takes .6 µs.

> It is also very dependant on anything else in the system. RDTSC (when used correctly) is immune to them.
Are you referring to the ''PMT jumps sometimes with heavy bus traffic'' issue? I would have said it the other way around: with the TSC, you can only hope the CPU doesn''t pull the rug out from under you.

quote:
I am NOT sure if they were on (I beleive they were as they are supposed to share instruction sets) the Cyrix and AMD 80586 and 80686 chips.

I used to run a K6-III, and I''m pretty sure it had rdtsc.


Bad Maniac:
enum
{
TSC = BIT(4),
..
MMX = BIT(23),
Those are the standard feature flags. But c''mon, that''s what the manual is for


Krylloan:
good question, ask Microsoft

quote:
There''s a crystal-managed hardware timer on all (?) motherboards that provides approximately nanosecond accuracy, and a means for the hardware to read the state of the timer whenever it desires to. It also runs with the power turned off.

Are you sure? I know of the RTC, which is battery-backed, but clocked at 32 KHz. IIRC, it can only be used to generate periodic interrupts anyway, not to return a timestamp. The PIT and PMT are 1.19 and 3.57 MHz respectively; I''m not sure if they run in deep sleep. The new HPET Microsoft was begging for (irony: they don''t use it yet as far as I can tell, but Linux does) is at least 10 MHz, and *accurate* to 1 ns - is that the one you mean? Unfortunately, it''s not required to run in power-saving mode. *sigh*

quote:
What''s the big issue in making a driver/system call that basically blurts the entire value in the hardware clock (nanosecond fraction and seconds since start time (1970?)) to the calling function, and does it quick enough that it can be done thousands of times a second with no performance loss?.

That would be nice Since none of the timers are non-volatile, the OS would have to help, either 1) adding the date to the timer at startup or whenever it changes, or 2) using the timestamp to enhance its own timekeeping. The PIT is quite slow to read, and PMT has this annoying jump issue, so that leaves the HPET. Bonus: it''s memory mapped, so no port I/O slowdown.
2) can be done by yourself (I do, and it ended up 600 lines), but this really should be done by the OS (which has access to the PIT, and more knowledge of the system/APM events, etc.).

> Is it that requesting the clock data from the motherboard is difficult / slow?
> That the resolution is variable and not accessible? I just don''t get it.
It currently requires port I/O, but even that isn''t horribly slow (ok, a few µs). Resolution and frequency are known.
I don''t get it either - it''s a #)(*&% disgrace. The one high resolution timer API we have is even crippled on multiprocessor systems, because it''s documented to sometimes fail to keep results consistent between CPUs. FFS! Linux manages somehow.. (gettimeofday)

Share this post


Link to post
Share on other sites
the first point thingy was regarding your point but as a clarification, not prooving you wrong.

the differentiation to half a clock tick is an engineering and science thing.


basically, if you can see a scale that has increments of lets say 0, 1, 2, 3, 4, ..., n

you can safely differentiate to either 0, 0.5, 1, 1.5,...,n and it is assumed correct, as long as you know that it is inbetween 0 and 1 and roughly mid way it is technically correct to say its 0.5. It ISN''T technically correct to assume its 0.33, 0.66, or 0.25 or 0.5 unless there is an accurate way to measure it. Even if you can see its spot on, it isn''t considered accurate.

Same goes for timers.


Lets say if you have

int a;
unsigned hyper clocktime;
startclock(clocktime);
a<1024;
a<1024;
endclock(clocktime);
Edit1->Text=AnsiString(clocktime);

it will return 2;


BUT
int a;
unsigned hyper clocktime;
startclock(clocktime);
a^1024;
a^1024;
endclock(clocktime);
Edit1->Text=AnsiString(clocktime);

will return 0;


Since nothing can take 0 time, you could say its 0.5. I just say its close to 0, but you can only say that a^1024 is either 0 or 0.5.


Bascially that''s how you measure anything anywhere to technical specifications, ISO for instance. You can only measure to HALF the smallest increment.

In reality the above examples are a<1024 is one clock tick (very close to 1) and a^1024 is so close to 0 that if you had

startclock(clocktime);
for(register int a;a<1024;a++);
endclock(clocktime);
Edit1->Text=AnsiString(clocktime);


and


startclock(clocktime);
for(register int a;a^1024;a++);
endclock(clocktime);
Edit1->Text=AnsiString(clocktime);

the second one will be about 1000 to 1500 clock ticks faster.

Oh, that''s one of those spiffy bit wise operations that can really help performance.
In the above loop, that''s roughly a 20% increase in performance will full optimisation.



Anyway, if you do engineering or science you get the hang of this measurement stuff. I try and get averages to get a more closer result, but from a single measurement you only get (in the case of this timer) 0 or 1, and if you can take a good guess, 0.5.

That''s how you measure stuff more accurately than you normally would, without getting fired

Oh, don''t go and measure things like car numbers and say "I have 55.5 cars in this ship". Some things are just kind of stupid

Share this post


Link to post
Share on other sites
thought of a better example



startclock(&clock);
func1();
func1();
func1();
func1();
func1();
endclock(&clock);
Edit1->Text=AnsiString(clock);

returns 5




startclock(&clock);
func2();
func2();
func2();
func2();
func2();
endclock(&clock);
Edit1->Text=AnsiString(clock);

returns 2






startclock(&clock);
func3();
func3();
func3();
func3();
func3();
endclock(&clock);
Edit1->Text=AnsiString(clock);

returns 1




example a: safe to assume that func1() takes 1 clock tick per call
example b: safe to assume that func1() takes 0.5 clock ticks per call
example c: safe to assume that func1() takes 0 clock ticks per call

That''s just averages and then rounded to the nearest half an increment, that being 0, 0.5 and 1.


Hope you get the drift.

Happy programming.

Share this post


Link to post
Share on other sites

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!