Which Timer to use that have least overhead?

Started by
9 comments, last by littlekid 13 years, 4 months ago
Hi, I would likely call the timer function a lot of times in a single frame. And I was wondering which timer should I use that is of average accuracy (doesn't need to be very accurate) but have a very low overhead cost?

I was thinking either:
RDTSC or
QueryPerformanceCounter/clock_gettime


The problem I am trying to solve is to get the time taken for each tasks so as to build a histogram of average time taken which is needed for load balancing. (maybe there are better ways to do?)

1. As each task run, get its average time (might be in micro sec accuracy, since some task execute quickly)

2. After a certain amount of time (say 5min) using the average times of each task, re-balance the tasks to the various threads, so that each thread can have a higher throughput.

regards
Advertisement
Quote:Original post by littlekid
(maybe there are better ways to do?)


Using a profiler is a non-invasive and more accurate measurement for many cases. Even if more time-expensive to learn than "in code timers", in the longer term you'll save a lot of time.

If you're using it to measure such small tasks, probably QPC is your only decent option, though it has higher overhead than RDTSC. But on modern multi-core systems, calling RDTSC isn't always safe and sometimes values will jump backwards and forwards in time (I've seen the same happen with QPC too, but it's generally much more stable). QPC can have an overhead of one or two milliseconds if you're calling it tens of thousands of times per frame though, so try not to profile that many things at once (e.g. don't call it from somewhere deep down in multiple nested loops, or the overhead from QPC itself can distort your results).
A profiler for dynamic load balancing? Sounds complex, if it is even possible. I think the internal timer system is simpler, without a huge loss in accuracy.

@littlekid - the easiest way to discover this is to ... use a profiler [smile]. As in, profile both timers and see which one has lower overhead. A little bit of forward thinking might allow you to quickly change between the two timers (e.g. encapsulate timing inside a class).

If many of your tasks complete quickly - maybe you can figure a way to batch them? Lots of short tasks can cause trouble getting good throughput for some scheduler algorithms.
How accurate is "not very accurate"? Within 1ms? Within 10ms? Within 30ms?

Is you really don't care about accuracy, GetTickCount() is pretty simple, and has around 30ms accuracy, timeGetTime() can get to about 1ms accuracy, and for anything else, you want to use QueryPerformanceCounter(). Using RDTSC is a very bad idea unless you really need to count individual CPU clock cycles.
Quote:Original post by rip-off
A profiler for dynamic load balancing? Sounds complex, if it is even possible. I think the internal timer system is simpler, without a huge loss in accuracy.


Oops, I've read the whole post. The exception being the words "load" and "balancing" ...

He did say "might be in micro sec accuracy" which is why I didn't bother GetTickCount() or timeGetTime(), but if there is a mixture of slow and fast tasks you could use a mixture of timing methods. From what I remember, timeGetTime() does have a much lower overhead than QPC on most machines.

As for profilers, they're great for finding hotspots running the program in a controlled environment, but the OP was after run-time load balancing, afaik most profilers don't offer any sort of SDK so you can embed them in your app do they? Of course, maybe the run-time aspect isn't really necessary and was just some "blue-sky" thinking, in which case yeah go for a profiler first. :)
QueryPerformanceCounter is pretty fast, you'd probably have to call it thousands upon thousands of times per frame to put even the slightest dent in performance. It's very accurate.

The other timers might be faster, but I'd think something is seriously wrong if you manage to cause a performance problem with it. I don't know what is "a lot" for you but you will have bigger things to worry about.
I'd recommend reading This.
Yep that's not a bad article. One modern timer that isn't mentioned in that article is the "Invariant TSC", which is available on new Intel and AMD processors (query for support using the cpuid instruction). It's used by just calling the rdtsc instruction, and in terms of overhead should be similar to traditional rdtsc calls (i.e. much faster than QPC), but unlike the old rdtsc it should always increment at a fixed rate.

(Just like QPC, which should also be invariant, I have come across some machines that claim to support the invariant TSC and yet rdtsc values jump backwards and forwards in time quite drastically, so whatever method you use you might want to put some sanity checks in your code.)

This topic is closed to new replies.

Advertisement