# Counting clock cycles ?

This topic is 4708 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hi everyone. I'm currently learning the C language at the moment, having previously coded in Java and I am interested in finding the most accurate way possible of timing how fast (or not) my code is. I've managed to use the timing functions from the Win32 API with success, but I've read elsewhere that a more accurate measurement can be obtained by using an inbuilt instruction in modern processors which keeps track of the number of clock cycles accumalated. I've managed to find some inline assembly code to which uses this instruction but unfortunately it didn't work under my compiler. I changed around the code however to get it compiling, and it appears to be working somewhat- i'll say that quite tentatively... I just want to know if this code is completely correct ? Forgive me but I have yet to learn assembly language (I will at a later stage)- I just want a fairly accurate means of measuring my code. If you have any corrections or know of any problems then please let me know. Thanks, Darragh Here is the code which obtains the timestamp (in clock cycles) from the processor. The value in clock cycles is stored in the variable 'clockTime':

unsigned long clockTime = 0;

void getClockTime ( void )
{

//==========================================================================
// GET TIME STAMP FROM PROCESSOR
//==========================================================================

// USE RDTSC INSTRUCTION FOR INTEL PENTIUM AND MODERN AMD PROCESSORS

asm ( "rdtsc" : "=A"(clockTime)  );

}



##### Share on other sites
Quote:
 Original post by Darragh...it didn't work under my compiler...

What do you mean "it didn't work"?

##### Share on other sites
Quote:
Original post by bakery2k1
Quote:
 Original post by Darragh...it didn't work under my compiler...

What do you mean "it didn't work"?

Err, well it didn't compile! : ) The linker spat out a bunch of messages telling me about undefined external references or something- basically that my function to retrieve the time couldn't be found. I had to rearrange the code I found in order to get it compiling.

Just out of interest- here is the code from the article which refused to compile:

#include <stdio.h>int timeGetTime(void);#pragma aux timeGetTime = ".586" "rdtsc" value [eax] modify [edx];void DoFpMult(void) {    int i;    float val2 = (float)1.00007;    float val1 = (float)1.2;    int startTime;    startTime = timeGetTime();    for (i = 0; i < 1000000; i++)     {        val1 *= val2;    }    printf("Took %d clocks result:%f\n",timeGetTime()-startTime,val1);}

Basically all I want to know is that what I'm doing is correct, that's all..

##### Share on other sites
IIRC, it looks something like this if you're using MSVC (the syntax you posted is GCC syntax):

__declspec( naked ) unsigned __int64 readclocks() {  asm {    rdtsc    ret  }}

This will return the number of clock cycles since start for the current CPU, which may or may not be useful -- the speed of it varies with SpeedStep, and if you have a multi-CPU system with multiple SpeedStep CPUs, they won't actually be synchronized to each other. Also, this reads an instantaneous value, it doesn't force any out-of-order instructions to retire, so it's only usefully accurate on larger chunks of code (say, 3000 cycles and up).

A better choice is QueryPerformanceCounter()/QueryPerformanceFrequency(), which works on most multi-CPU systems, BUT which has bugs on a number of Pentium III motherboards.

##### Share on other sites
Quote:
 Original post by hplus0603A better choice is QueryPerformanceCounter()/QueryPerformanceFrequency(), which works on most multi-CPU systems, BUT which has bugs on a number of Pentium III motherboards.

What sort of problems; and where can I read more about them?

##### Share on other sites
I also wanted some good performance timing, so I stumbled on the
rdtsc 2 years ago (when I didn't have much to do and I was learning assembly) and developed this cool macro. Note that this is Intel-syntax assembly and not GAS Assembly (or AT&T syntax), so you might want to convert it first.

	//Performance Timing (CPU-level, cycles)	//X - stores starting time	#define START(X)	__asm{rdtsc} 				__asm{mov [X], eax}		//X - uses prev. time and then holds result timing	#define END(X)		__asm{rdtsc} 				__asm{mov ecx, [X]} 				__asm{sub eax, ecx} 				__asm{mov [X], eax}

Usage (example):
int a;START(a);... do something ...END(a);printf("Time taken : %d\n", a - 11);

The 11 that I've subtracted was the overhead time or something, anyway, try running without subtracting 11 without doing any executions between START & END and see what it returns.

##### Share on other sites
Thanks everyone. I presume my code is OK then. It exhibits expected behaviour anyhows in that the value between timestamps increases as I add more commands in etc.. It also shows some degree of variation in its results on slower / longer blocks of code- the sort you you expect in a multitasking enviroment such as Windows.

Last Attacker: using a macro instead of a function call is a good idea, i think i'll try that instead.

Lol, so there it is- my first piece of assembly code! : )

##### Share on other sites
Note that:

__asm{rdtscmov ecx, [X]sub eax, ecxmov [X], eax}

has an unnecessary move, it could be written as:

__asm{rdtscsub eax, [X]mov [X], eax}

##### Share on other sites
RDTSC might cause problems on dualcore/ht machines when
determining time-delta's (what might happen is cpu0 takes
the timestamp, the process switches to cpu1 and it might
be running at a completely different value - so delta's
are trash).

should actually be somewhere here on gamedev.net

The QueryPerformanceCounter has the same problem (as far
as I know). Note also that the queryperformancecounter
can "leap" forward a few seconds under heavy pci bus load.

timeGetTime is dualcore safe (but it only has a 1ms resolution,
best case).

You can make rdtsc dualcore-safe (actually I'm writing a small
article on it - but it will take some time before I finish it).

If you want to use it for profiling code, you might also consider
boosting the current process to realtime - put it back to 'normal'
immediately after profiling, or you might have _very_ slow response
from windows (or even lockups).

btw, for converting the rdtsc clockticks to sth more usefull
(like seconds), go here:
http://www.intel.com/cd/ids/developer/asmo-na/eng/dc/centrino/knowledgebase/81711.htm

Regards

1. 1
2. 2
3. 3
4. 4
Rutin
12
5. 5

• 12
• 16
• 10
• 14
• 10
• ### Forum Statistics

• Total Topics
632659
• Total Posts
3007691
• ### Who's Online (See full list)

There are no registered users currently online

×