From megahurtz to speed
Just thinking. What does the processor speed actually mean? Let's suppose I have a program that contains only one kind of instruction, say assignment. That is, it is an infinite loop of the form
while (true) a = 2;
Is there a way to calculate, by knowing the Hertz rate of my processor, how many times per second this instruction will get executed? Would I have to know other things? Could a similar calculation be made for more complicated expressions, like function calls?
You need to know things like, how long does it take to execute each assembly instruction that the compiler generates - which means knowing what output the compiler generates for that 'simple' line of C code[smile]; how long does it take to access the memory (is a already in a register? in the cache? in main memory? paged out?) - which requires knowing how fast your memory bus is and what is the memory latency, cache speed, line size, associativity... or if you have a special memory architecture (e.g. vector machines); It also requires you knowing what other applications are contending for CPU time with your loop (the operating system is also a program, after all), and how the operating system schedules them (scheduling policy, process priority, multi-CPU systems...); it also requires knowing the length of your CPU pipeline, whether it does speculative execution, branch prediction, out-of-order execution (watch for data hazards), and how many functional units it has (and to know which unit executes which kind of instruction).
Not all here is relevant to your precise example, but it does show that there's much more involved than raw MHz when dealing with computer speed. The clock speed is just the 'base unit' when talking about the execution speed of a single instruction, outside of any context - the time it spends locking one (or more) of the CPU's functional units (e.g. "On CPU foo, operation bar takes k clock cycles").
When disabling optimizations, after the setup code in main(), my compiler spits out:
(with optimizations, the movl is completely removed, since the value is never used)
I don't, unfortunately have a table indicating how long each instruction takes. You could probably figure it out by using a profiler (particularly a profiler provided by your CPU manufacturer, which would be aware of individual operation costs).
Barring that, taking a computer architecture class would probably answer all your questions.
Not all here is relevant to your precise example, but it does show that there's much more involved than raw MHz when dealing with computer speed. The clock speed is just the 'base unit' when talking about the execution speed of a single instruction, outside of any context - the time it spends locking one (or more) of the CPU's functional units (e.g. "On CPU foo, operation bar takes k clock cycles").
When disabling optimizations, after the setup code in main(), my compiler spits out:
.L2: movl $2, -4(%ebp) jmp .L2
(with optimizations, the movl is completely removed, since the value is never used)
I don't, unfortunately have a table indicating how long each instruction takes. You could probably figure it out by using a profiler (particularly a profiler provided by your CPU manufacturer, which would be aware of individual operation costs).
Barring that, taking a computer architecture class would probably answer all your questions.
For information on instruction cycle counts (that is, how long each instruction takes) check out the AMD and Intel developer web sites. They have PDF containing the full specs for their CPU's freely available.
And just to mention something that fruny didn't mention explicitly, your CPU can be executing more than one instruction at the same time. I'm sure you can imagine just how much this will complicate determing the 'speed' of a piece of code on it's own, and that's before taking into account things like cache hits/misses. These days there really isn't much point in trying the measure the number of cycles a particular block of code will take to execute.
AFAIK this should be a problem, if the reordering of instructions could affect their results then it doesn't do it. Eg. the following is guarentteed to be executed in-order
And just to mention something that fruny didn't mention explicitly, your CPU can be executing more than one instruction at the same time. I'm sure you can imagine just how much this will complicate determing the 'speed' of a piece of code on it's own, and that's before taking into account things like cache hits/misses. These days there really isn't much point in trying the measure the number of cycles a particular block of code will take to execute.
Quote:Original post by Fruny
out-of-order execution (watch for data hazards)
AFAIK this should be a problem, if the reordering of instructions could affect their results then it doesn't do it. Eg. the following is guarentteed to be executed in-order
mov ebx,ecxadd eax,ebx
OK, so you are saying that it is essentially hopeless to try to figure which of two pieces of code is faster by knowing the Hertz rate. On the other hand, for a single piece of code, doubling the Hertz halves the execution time if the CPU speed is the limiting factor, right?
Quote:Original post by King of MenYes. If the CPU speed is the limiting factor, as you said.
OK, so you are saying that it is essentially hopeless to try to figure which of two pieces of code is faster by knowing the Hertz rate. On the other hand, for a single piece of code, doubling the Hertz halves the execution time if the CPU speed is the limiting factor, right?
You can use tools called code profilers to figure out if the CPU might be a limiting factor. These tools tell you where code bottlenecks might be in your application. There are several freeware ones depending on which language you're using.
Thanks, but I don't actually have performance problems - it was just something I was curious about.
For Windows programming there's a couple functions QueryPerformanceCounter() and QueryPerformanceFrequency(). I think QueryPerformanceCounter() can tell you how many "ticks" have gone by and QueryPerformanceFrequency() can tell you how much time each "tick" is worth. I think the timer it uses is fine enough to measure instruction cycles, but I'm not sure. Of course, these functions also take time to run and would probably throw off the result a bit.
If I remember this right, there's a high resolution timer(s) on all (or most) processors that incriments itself (almost) every instruction cycle. This timer can be accessed using assembly.
If I remember this right, there's a high resolution timer(s) on all (or most) processors that incriments itself (almost) every instruction cycle. This timer can be accessed using assembly.
But the question was whether there was a way to *calculate* this, not measure it by running the program... :)
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement