Archived

This topic is now archived and is closed to further replies.

Ademan555

Best, most accurate clock?

Recommended Posts

Dredge-Master    175
try it

one cycle can contain more than one instruction (on newer cpu''s anyway - I''ve never used high performance timers on pre-Pentium class chips, or on the Solaris machine) - just depends which ones.

That''s why you use x^1023 instead of x<1023. C compilers will not optimise this because in the instance of

for (i=0;i<1023;a?i+=2:++i)func(a,i);

for (i=0;i^1023;a?i+=2:++i)func(a,i);

the second will become unpredictable.

It''s those antsi little optimisations that can make small but sometimes useful changes (try it when comparing very large quantities of small fixed point data - alot faster)

It''s a bit wise comparison, not a numeric one. Hence the manual optimisation.

It''s like a/(x|1).
Figure that one out Compilers don''t optimise that into their code either



For the accuracy it''s because of interference from other processes - even in the highest thread states they still share resources very occasionally. That and the damned caching of some variables, so certain small variables won''t be timed correctly.

I thought it would be accurate all the time, but the Intel documentation for RDTSC said otherwise. I tried it and it was right, and I was wrong. It shows that sometimes documentation can be helpful.

Share this post


Link to post
Share on other sites
Dredge-Master    175
btw - regarding the profiling multiple times
I profile each function for testing either 200 multiples (for smaller code) or 100 (for slower code).

When not testing, each timed function is logged every time it is used.

For code measurement I use RDTSC. For frame and cycle frequency I use QueryPerformanceCounter.

Share this post


Link to post
Share on other sites
Jan Wassenberg    999
> one cycle can contain more than one instruction (on newer cpu''s anyway ..
Right, but 4 instructions in half a clock is a bit unrealistic Max issue rate is 3 instructions/clock on my Athlon.

quote:

> What is your definition of accurate?
For the accuracy it''s because of interference from other processes - even in the highest thread states they still share resources very occasionally.

You mean your thread is preempted more often when calling Windows APIs, i.e. there''s a Reschedule call in there somewhere? hmm, that could be. Supporting evidence: I''ve noticed ReadFileEx callbacks are sometimes delivered from within an API call.

quote:
I thought it would be accurate all the time, but the Intel documentation for RDTSC said otherwise. I tried it and it was right, and I was wrong. It shows that sometimes documentation can be helpful.

?

(at home, high latency )

Share this post


Link to post
Share on other sites
Guest Anonymous Poster   
Guest Anonymous Poster
The whole measure 0.5 ticks thing is kinda correct. In science and engineering you normally have a scale to read off of and you say it is either near a mark on the scale of between the marks on the scale - giving you the ability to judge at most accurate half the scale used (as long as there are no other factors). This does not apply to this case really because in 1 call you get a whole number of ticks.

It is right to say that if you have a larger set of measurements you can say the result more accurately - but things start to get complicated when you have to estimate error in the readings. It also depends on whether you measure the whole lot in 1 lump or you measure them 1 by 1 and add it together.

Share this post


Link to post
Share on other sites
Jan Wassenberg    999
AP: agreed. Gets even better when you expect the results to be good to n units, yet your method of getting the timestamp takes several multiples of n

Oh bother. Wish I had found that while researching my timer, would have saved some time I wrote up another article about this, which also covers timer peculiarities, but looks to their hardware implementations to see why.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster   
Guest Anonymous Poster
Hello all,

Following program reports that 536 clockcycles have elapsed between the first rdtsc and the second rdtsc. This looks like a lot of cycles to me...

I ran the test on a 2.4 Ghz PIV under Suse Linux 8.0, gcc 3.3. Results are the same for O1 and O2. O0 returns 576 and O3 messes up... .

Intel 7.0 for Linux compiler reports 560 cycles under O0.
O1, O2 and O3 return garbage on Intel.
Am I doing something wrong here?

- Kurt

P.S. I tried some warming up by calling the rdtsc/cpuid sequence multiple times, but that didn''t make any difference... .
----
#define rdtscll(x)\
__asm__ __volatile__ ("rdtsc" : "=A" (x))
#define cpuid __asm__ __volatile__ (".byte 0x0f, 0xa2" : : : "eax", "ebx", "ecx", "edx"

int main()
{
unsigned cycles1, cycles2;
cycles1=0;
cycles2=0;

cpuid;
rdtscll(cycles1);
cpuid;
rdtscll(cycles2);

cout << cycles2-cycles1 << "\n";
}

Share this post


Link to post
Share on other sites