Sign in to follow this  
Darragh

Counting clock cycles ?

Recommended Posts

Hi everyone. I'm currently learning the C language at the moment, having previously coded in Java and I am interested in finding the most accurate way possible of timing how fast (or not) my code is. I've managed to use the timing functions from the Win32 API with success, but I've read elsewhere that a more accurate measurement can be obtained by using an inbuilt instruction in modern processors which keeps track of the number of clock cycles accumalated. I've managed to find some inline assembly code to which uses this instruction but unfortunately it didn't work under my compiler. I changed around the code however to get it compiling, and it appears to be working somewhat- i'll say that quite tentatively... I just want to know if this code is completely correct ? Forgive me but I have yet to learn assembly language (I will at a later stage)- I just want a fairly accurate means of measuring my code. If you have any corrections or know of any problems then please let me know. Thanks, Darragh Here is the code which obtains the timestamp (in clock cycles) from the processor. The value in clock cycles is stored in the variable 'clockTime':

unsigned long clockTime = 0;

void getClockTime ( void )
{

    //==========================================================================
    // GET TIME STAMP FROM PROCESSOR
    //==========================================================================
    
    // USE RDTSC INSTRUCTION FOR INTEL PENTIUM AND MODERN AMD PROCESSORS
    
    asm ( "rdtsc" : "=A"(clockTime)  );

}


Share this post


Link to post
Share on other sites
Quote:
Original post by bakery2k1
Quote:
Original post by Darragh
...it didn't work under my compiler...


What do you mean "it didn't work"?


Err, well it didn't compile! : ) The linker spat out a bunch of messages telling me about undefined external references or something- basically that my function to retrieve the time couldn't be found. I had to rearrange the code I found in order to get it compiling.

Just out of interest- here is the code from the article which refused to compile:



#include <stdio.h>

int timeGetTime(void);

#pragma aux timeGetTime = ".586" "rdtsc" value [eax] modify [edx];

void DoFpMult(void)
{
int i;
float val2 = (float)1.00007;
float val1 = (float)1.2;
int startTime;

startTime = timeGetTime();

for (i = 0; i < 1000000; i++)
{
val1 *= val2;
}

printf("Took %d clocks result:%f\n",timeGetTime()-startTime,val1);
}




Basically all I want to know is that what I'm doing is correct, that's all..

Share this post


Link to post
Share on other sites
IIRC, it looks something like this if you're using MSVC (the syntax you posted is GCC syntax):


__declspec( naked ) unsigned __int64 readclocks() {
asm {
rdtsc
ret
}
}


This will return the number of clock cycles since start for the current CPU, which may or may not be useful -- the speed of it varies with SpeedStep, and if you have a multi-CPU system with multiple SpeedStep CPUs, they won't actually be synchronized to each other. Also, this reads an instantaneous value, it doesn't force any out-of-order instructions to retire, so it's only usefully accurate on larger chunks of code (say, 3000 cycles and up).

A better choice is QueryPerformanceCounter()/QueryPerformanceFrequency(), which works on most multi-CPU systems, BUT which has bugs on a number of Pentium III motherboards.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by hplus0603
A better choice is QueryPerformanceCounter()/QueryPerformanceFrequency(), which works on most multi-CPU systems, BUT which has bugs on a number of Pentium III motherboards.

What sort of problems; and where can I read more about them?

Share this post


Link to post
Share on other sites
I also wanted some good performance timing, so I stumbled on the
rdtsc 2 years ago (when I didn't have much to do and I was learning assembly) and developed this cool macro. Note that this is Intel-syntax assembly and not GAS Assembly (or AT&T syntax), so you might want to convert it first.


//Performance Timing (CPU-level, cycles)
//X - stores starting time
#define START(X) __asm{rdtsc} __asm{mov [X], eax}

//X - uses prev. time and then holds result timing
#define END(X) __asm{rdtsc} __asm{mov ecx, [X]} __asm{sub eax, ecx} __asm{mov [X], eax}



Usage (example):

int a;
START(a);

... do something ...

END(a);

printf("Time taken : %d\n", a - 11);



The 11 that I've subtracted was the overhead time or something, anyway, try running without subtracting 11 without doing any executions between START & END and see what it returns.

Share this post


Link to post
Share on other sites
Thanks everyone. I presume my code is OK then. It exhibits expected behaviour anyhows in that the value between timestamps increases as I add more commands in etc.. It also shows some degree of variation in its results on slower / longer blocks of code- the sort you you expect in a multitasking enviroment such as Windows.

Last Attacker: using a macro instead of a function call is a good idea, i think i'll try that instead.

Lol, so there it is- my first piece of assembly code! : )

Share this post


Link to post
Share on other sites
RDTSC might cause problems on dualcore/ht machines when
determining time-delta's (what might happen is cpu0 takes
the timestamp, the process switches to cpu1 and it might
be running at a completely different value - so delta's
are trash).

Google for "Timer Pitfalls" for more info - the article
should actually be somewhere here on gamedev.net

The QueryPerformanceCounter has the same problem (as far
as I know). Note also that the queryperformancecounter
can "leap" forward a few seconds under heavy pci bus load.

timeGetTime is dualcore safe (but it only has a 1ms resolution,
best case).

You can make rdtsc dualcore-safe (actually I'm writing a small
article on it - but it will take some time before I finish it).

If you want to use it for profiling code, you might also consider
boosting the current process to realtime - put it back to 'normal'
immediately after profiling, or you might have _very_ slow response
from windows (or even lockups).

btw, for converting the rdtsc clockticks to sth more usefull
(like seconds), go here:
http://www.intel.com/cd/ids/developer/asmo-na/eng/dc/centrino/knowledgebase/81711.htm

Regards

Share this post


Link to post
Share on other sites
Quote:
Original post by Kitt3n
RDTSC might cause problems on dualcore/ht machines when
determining time-delta's (what might happen is cpu0 takes
the timestamp, the process switches to cpu1 and it might
be running at a completely different value - so delta's
are trash).


Fortunately I don't have to worry about that for the time being, my P4 doesn't have hyperthreading and my system doesn't have multiple cpus. It is worrying though- its becoming harder and harder to properly time code on modern computers..

Quote:
Original post by Kitt3nGoogle for "Timer Pitfalls" for more info - the article
should actually be somewhere here on gamedev.net


Thanks, i'll save that article for future reference.

Quote:
Original post by Kitt3n
If you want to use it for profiling code, you might also consider
boosting the current process to realtime - put it back to 'normal'
immediately after profiling, or you might have _very_ slow response
from windows (or even lockups).


Indeed. Increasing the processes's priority does seem to make the timing more consistent.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this