hmm, ok. I''m still not convinced about your magical ability to be confident in measurements that yield .5, and not (say) .33.
In this case, the uncertainty of where in the tick you start/stop kills you (may actually have been +/- 2 ticks) - no statement can be made about the times, due to insufficient resolution. If we increase the # of samples to avoid this problem and get 3000 calls in 1000 ticks, I''ll be damned if I''ll call that 0.5 ticks per call
Best, most accurate clock?
I think what he''s trying to show is that 3000 calls / 1000 ms is more 0.5 ticks per call than what it is 1 tick. Why on earth you''d like sub clock tick timing, when it''s hard to find a timer that''s even remotely that accurate is beyond me though :D
Something like this can be tested easily
rdtsctimer();
x<y;
x<y;
x<y;
x<y;
rdtsctimer();
outputresult();
rdtsctimer();
x^y;
x^y;
x^y;
x^y;
rdtsctimer();
outputresult();
Big difference. 4clock ticks. One of those times you can assume that x^y is less than half a clock tick (even four of them is less than half a clock tick).
As I said, if you haven't spent a few years studying it, ignore it. If you have studied it you will know why you use it.
for short intervals rdtsc is dead accurate, long time measurements it becomes less accurate.
queryperformance is the other way around.
edit: < to <
[edited by - Dredge-Master on December 17, 2003 9:09:27 PM]
quote:Original post by Dredge-Master
rdtsctimer();
x<y;
x<y;
x<y;
x<y;
rdtsctimer();
outputresult();
rdtsctimer();
x^y;
x^y;
x^y;
x^y;
rdtsctimer();
outputresult();
if those are built in types wouldn''t they get optimised away as they have no side effects and the return value isn''t used.
quote:As I said, if you haven''t spent a few years studying it, ignore it. If you have studied it you will know why you use it
With your experience, wouldn''t you normally profile something a large number of times, rather than just once?
quote:I think what he''s trying to show is that 3000 calls / 1000 ms is more 0.5 ticks per call than what it is 1 tick. Why on earth you''d like sub clock tick timing, when it''s hard to find a timer that''s even remotely that accurate is beyond me though :D
If that''s all he''s saying, I agree
CPU clock accuracy is not the issue: it may be bad (cheap crystals have something like 200 PPM freq tolerance), but the ''benchmark'' is still counting clocks, no matter how long they are.
A more important question: how is an instruction going to take less than a clock?!
> even four of them [xor] is less than half a clock tick.
oh come on
quote:for short intervals rdtsc is dead accurate, long time measurements it becomes less accurate.
queryperformance is the other way around.
What is your definition of accurate?
> With your experience, wouldn''t you normally profile something a large number of times, rather than just once?
I guess in this case it''s alright - with such a short piece of code, you will quickly notice if you got preempted (no more 4 clock time difference). The only other change would be warming the cache, and that''s not an issue here either.
try it
one cycle can contain more than one instruction (on newer cpu''s anyway - I''ve never used high performance timers on pre-Pentium class chips, or on the Solaris machine) - just depends which ones.
That''s why you use x^1023 instead of x<1023. C compilers will not optimise this because in the instance of
for (i=0;i<1023;a?i+=2:++i)func(a,i);
for (i=0;i^1023;a?i+=2:++i)func(a,i);
the second will become unpredictable.
It''s those antsi little optimisations that can make small but sometimes useful changes (try it when comparing very large quantities of small fixed point data - alot faster)
It''s a bit wise comparison, not a numeric one. Hence the manual optimisation.
It''s like a/(x|1).
Figure that one out Compilers don''t optimise that into their code either
For the accuracy it''s because of interference from other processes - even in the highest thread states they still share resources very occasionally. That and the damned caching of some variables, so certain small variables won''t be timed correctly.
I thought it would be accurate all the time, but the Intel documentation for RDTSC said otherwise. I tried it and it was right, and I was wrong. It shows that sometimes documentation can be helpful.
one cycle can contain more than one instruction (on newer cpu''s anyway - I''ve never used high performance timers on pre-Pentium class chips, or on the Solaris machine) - just depends which ones.
That''s why you use x^1023 instead of x<1023. C compilers will not optimise this because in the instance of
for (i=0;i<1023;a?i+=2:++i)func(a,i);
for (i=0;i^1023;a?i+=2:++i)func(a,i);
the second will become unpredictable.
It''s those antsi little optimisations that can make small but sometimes useful changes (try it when comparing very large quantities of small fixed point data - alot faster)
It''s a bit wise comparison, not a numeric one. Hence the manual optimisation.
It''s like a/(x|1).
Figure that one out Compilers don''t optimise that into their code either
For the accuracy it''s because of interference from other processes - even in the highest thread states they still share resources very occasionally. That and the damned caching of some variables, so certain small variables won''t be timed correctly.
I thought it would be accurate all the time, but the Intel documentation for RDTSC said otherwise. I tried it and it was right, and I was wrong. It shows that sometimes documentation can be helpful.
btw - regarding the profiling multiple times
I profile each function for testing either 200 multiples (for smaller code) or 100 (for slower code).
When not testing, each timed function is logged every time it is used.
For code measurement I use RDTSC. For frame and cycle frequency I use QueryPerformanceCounter.
I profile each function for testing either 200 multiples (for smaller code) or 100 (for slower code).
When not testing, each timed function is logged every time it is used.
For code measurement I use RDTSC. For frame and cycle frequency I use QueryPerformanceCounter.
> one cycle can contain more than one instruction (on newer cpu''s anyway ..
Right, but 4 instructions in half a clock is a bit unrealistic Max issue rate is 3 instructions/clock on my Athlon.
You mean your thread is preempted more often when calling Windows APIs, i.e. there''s a Reschedule call in there somewhere? hmm, that could be. Supporting evidence: I''ve noticed ReadFileEx callbacks are sometimes delivered from within an API call.
?
(at home, high latency )
Right, but 4 instructions in half a clock is a bit unrealistic Max issue rate is 3 instructions/clock on my Athlon.
quote:
> What is your definition of accurate?
For the accuracy it''s because of interference from other processes - even in the highest thread states they still share resources very occasionally.
You mean your thread is preempted more often when calling Windows APIs, i.e. there''s a Reschedule call in there somewhere? hmm, that could be. Supporting evidence: I''ve noticed ReadFileEx callbacks are sometimes delivered from within an API call.
quote:I thought it would be accurate all the time, but the Intel documentation for RDTSC said otherwise. I tried it and it was right, and I was wrong. It shows that sometimes documentation can be helpful.
?
(at home, high latency )
The whole measure 0.5 ticks thing is kinda correct. In science and engineering you normally have a scale to read off of and you say it is either near a mark on the scale of between the marks on the scale - giving you the ability to judge at most accurate half the scale used (as long as there are no other factors). This does not apply to this case really because in 1 call you get a whole number of ticks.
It is right to say that if you have a larger set of measurements you can say the result more accurately - but things start to get complicated when you have to estimate error in the readings. It also depends on whether you measure the whole lot in 1 lump or you measure them 1 by 1 and add it together.
It is right to say that if you have a larger set of measurements you can say the result more accurately - but things start to get complicated when you have to estimate error in the readings. It also depends on whether you measure the whole lot in 1 lump or you measure them 1 by 1 and add it together.
Someone else has already done most of the research here:
http://www.geisswerks.com/ryan/FAQS/timing.html
Okee?
http://www.geisswerks.com/ryan/FAQS/timing.html
Okee?
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement