Counting clock cycles ?

Started by
9 comments, last by Darragh 18 years, 5 months ago
Hi everyone. I'm currently learning the C language at the moment, having previously coded in Java and I am interested in finding the most accurate way possible of timing how fast (or not) my code is. I've managed to use the timing functions from the Win32 API with success, but I've read elsewhere that a more accurate measurement can be obtained by using an inbuilt instruction in modern processors which keeps track of the number of clock cycles accumalated. I've managed to find some inline assembly code to which uses this instruction but unfortunately it didn't work under my compiler. I changed around the code however to get it compiling, and it appears to be working somewhat- i'll say that quite tentatively... I just want to know if this code is completely correct ? Forgive me but I have yet to learn assembly language (I will at a later stage)- I just want a fairly accurate means of measuring my code. If you have any corrections or know of any problems then please let me know. Thanks, Darragh Here is the code which obtains the timestamp (in clock cycles) from the processor. The value in clock cycles is stored in the variable 'clockTime':


unsigned long clockTime = 0;

void getClockTime ( void )
{

    //==========================================================================
    // GET TIME STAMP FROM PROCESSOR
    //==========================================================================
    
    // USE RDTSC INSTRUCTION FOR INTEL PENTIUM AND MODERN AMD PROCESSORS
    
    asm ( "rdtsc" : "=A"(clockTime)  );

}


Advertisement
Quote:Original post by Darragh
...it didn't work under my compiler...


What do you mean "it didn't work"?

Quote:Original post by bakery2k1
Quote:Original post by Darragh
...it didn't work under my compiler...


What do you mean "it didn't work"?


Err, well it didn't compile! : ) The linker spat out a bunch of messages telling me about undefined external references or something- basically that my function to retrieve the time couldn't be found. I had to rearrange the code I found in order to get it compiling.

Just out of interest- here is the code from the article which refused to compile:

#include <stdio.h>int timeGetTime(void);#pragma aux timeGetTime = ".586" "rdtsc" value [eax] modify [edx];void DoFpMult(void) {    int i;    float val2 = (float)1.00007;    float val1 = (float)1.2;    int startTime;    startTime = timeGetTime();    for (i = 0; i < 1000000; i++)     {        val1 *= val2;    }    printf("Took %d clocks result:%f\n",timeGetTime()-startTime,val1);}


Basically all I want to know is that what I'm doing is correct, that's all..

IIRC, it looks something like this if you're using MSVC (the syntax you posted is GCC syntax):

__declspec( naked ) unsigned __int64 readclocks() {  asm {    rdtsc    ret  }}


This will return the number of clock cycles since start for the current CPU, which may or may not be useful -- the speed of it varies with SpeedStep, and if you have a multi-CPU system with multiple SpeedStep CPUs, they won't actually be synchronized to each other. Also, this reads an instantaneous value, it doesn't force any out-of-order instructions to retire, so it's only usefully accurate on larger chunks of code (say, 3000 cycles and up).

A better choice is QueryPerformanceCounter()/QueryPerformanceFrequency(), which works on most multi-CPU systems, BUT which has bugs on a number of Pentium III motherboards.
enum Bool { True, False, FileNotFound };
Quote:Original post by hplus0603
A better choice is QueryPerformanceCounter()/QueryPerformanceFrequency(), which works on most multi-CPU systems, BUT which has bugs on a number of Pentium III motherboards.

What sort of problems; and where can I read more about them?
See Timing Pitfalls and Solutions.
E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3
I also wanted some good performance timing, so I stumbled on the
rdtsc 2 years ago (when I didn't have much to do and I was learning assembly) and developed this cool macro. Note that this is Intel-syntax assembly and not GAS Assembly (or AT&T syntax), so you might want to convert it first.

	//Performance Timing (CPU-level, cycles)	//X - stores starting time	#define START(X)	__asm{rdtsc} 				__asm{mov [X], eax}		//X - uses prev. time and then holds result timing	#define END(X)		__asm{rdtsc} 				__asm{mov ecx, [X]} 				__asm{sub eax, ecx} 				__asm{mov [X], eax}


Usage (example):
int a;START(a);... do something ...END(a);printf("Time taken : %d\n", a - 11);


The 11 that I've subtracted was the overhead time or something, anyway, try running without subtracting 11 without doing any executions between START & END and see what it returns.
"Take delight in the Lord and He will give you your heart's desires" - Psalm 37:4My Blog
Thanks everyone. I presume my code is OK then. It exhibits expected behaviour anyhows in that the value between timestamps increases as I add more commands in etc.. It also shows some degree of variation in its results on slower / longer blocks of code- the sort you you expect in a multitasking enviroment such as Windows.

Last Attacker: using a macro instead of a function call is a good idea, i think i'll try that instead.

Lol, so there it is- my first piece of assembly code! : )

Note that:

__asm{rdtscmov ecx, [X]sub eax, ecxmov [X], eax}


has an unnecessary move, it could be written as:

__asm{rdtscsub eax, [X]mov [X], eax}
RDTSC might cause problems on dualcore/ht machines when
determining time-delta's (what might happen is cpu0 takes
the timestamp, the process switches to cpu1 and it might
be running at a completely different value - so delta's
are trash).

Google for "Timer Pitfalls" for more info - the article
should actually be somewhere here on gamedev.net

The QueryPerformanceCounter has the same problem (as far
as I know). Note also that the queryperformancecounter
can "leap" forward a few seconds under heavy pci bus load.

timeGetTime is dualcore safe (but it only has a 1ms resolution,
best case).

You can make rdtsc dualcore-safe (actually I'm writing a small
article on it - but it will take some time before I finish it).

If you want to use it for profiling code, you might also consider
boosting the current process to realtime - put it back to 'normal'
immediately after profiling, or you might have _very_ slow response
from windows (or even lockups).

btw, for converting the rdtsc clockticks to sth more usefull
(like seconds), go here:
http://www.intel.com/cd/ids/developer/asmo-na/eng/dc/centrino/knowledgebase/81711.htm

Regards
visit my website at www.kalmiya.com

This topic is closed to new replies.

Advertisement