From megahurtz to speed

Started by
8 comments, last by Spoonbender 19 years, 5 months ago
Just thinking. What does the processor speed actually mean? Let's suppose I have a program that contains only one kind of instruction, say assignment. That is, it is an infinite loop of the form while (true) a = 2; Is there a way to calculate, by knowing the Hertz rate of my processor, how many times per second this instruction will get executed? Would I have to know other things? Could a similar calculation be made for more complicated expressions, like function calls?
To win one hundred victories in one hundred battles is not the acme of skill. To subdue the enemy without fighting is the acme of skill.
Advertisement
You need to know things like, how long does it take to execute each assembly instruction that the compiler generates - which means knowing what output the compiler generates for that 'simple' line of C code[smile]; how long does it take to access the memory (is a already in a register? in the cache? in main memory? paged out?) - which requires knowing how fast your memory bus is and what is the memory latency, cache speed, line size, associativity... or if you have a special memory architecture (e.g. vector machines); It also requires you knowing what other applications are contending for CPU time with your loop (the operating system is also a program, after all), and how the operating system schedules them (scheduling policy, process priority, multi-CPU systems...); it also requires knowing the length of your CPU pipeline, whether it does speculative execution, branch prediction, out-of-order execution (watch for data hazards), and how many functional units it has (and to know which unit executes which kind of instruction).

Not all here is relevant to your precise example, but it does show that there's much more involved than raw MHz when dealing with computer speed. The clock speed is just the 'base unit' when talking about the execution speed of a single instruction, outside of any context - the time it spends locking one (or more) of the CPU's functional units (e.g. "On CPU foo, operation bar takes k clock cycles").

When disabling optimizations, after the setup code in main(), my compiler spits out:

.L2:    movl    $2, -4(%ebp)    jmp .L2


(with optimizations, the movl is completely removed, since the value is never used)

I don't, unfortunately have a table indicating how long each instruction takes. You could probably figure it out by using a profiler (particularly a profiler provided by your CPU manufacturer, which would be aware of individual operation costs).

Barring that, taking a computer architecture class would probably answer all your questions.
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." — Brian W. Kernighan
For information on instruction cycle counts (that is, how long each instruction takes) check out the AMD and Intel developer web sites. They have PDF containing the full specs for their CPU's freely available.

And just to mention something that fruny didn't mention explicitly, your CPU can be executing more than one instruction at the same time. I'm sure you can imagine just how much this will complicate determing the 'speed' of a piece of code on it's own, and that's before taking into account things like cache hits/misses. These days there really isn't much point in trying the measure the number of cycles a particular block of code will take to execute.


Quote:Original post by Fruny
out-of-order execution (watch for data hazards)

AFAIK this should be a problem, if the reordering of instructions could affect their results then it doesn't do it. Eg. the following is guarentteed to be executed in-order
mov ebx,ecxadd eax,ebx

"Voilà! In view, a humble vaudevillian veteran, cast vicariously as both victim and villain by the vicissitudes of Fate. This visage, no mere veneer of vanity, is a vestige of the vox populi, now vacant, vanished. However, this valorous visitation of a bygone vexation stands vivified, and has vowed to vanquish these venal and virulent vermin vanguarding vice and vouchsafing the violently vicious and voracious violation of volition. The only verdict is vengeance; a vendetta held as a votive, not in vain, for the value and veracity of such shall one day vindicate the vigilant and the virtuous. Verily, this vichyssoise of verbiage veers most verbose, so let me simply add that it's my very good honor to meet you and you may call me V.".....V
Yeah. It's not as simple as it used to be.

Oh. And just for giggles:

OK, so you are saying that it is essentially hopeless to try to figure which of two pieces of code is faster by knowing the Hertz rate. On the other hand, for a single piece of code, doubling the Hertz halves the execution time if the CPU speed is the limiting factor, right?
To win one hundred victories in one hundred battles is not the acme of skill. To subdue the enemy without fighting is the acme of skill.
Quote:Original post by King of Men
OK, so you are saying that it is essentially hopeless to try to figure which of two pieces of code is faster by knowing the Hertz rate. On the other hand, for a single piece of code, doubling the Hertz halves the execution time if the CPU speed is the limiting factor, right?
Yes. If the CPU speed is the limiting factor, as you said.
You can use tools called code profilers to figure out if the CPU might be a limiting factor. These tools tell you where code bottlenecks might be in your application. There are several freeware ones depending on which language you're using.
- k2"Choose a job you love, and you'll never have to work a day in your life." — Confucius"Logic will get you from A to B. Imagination will get you everywhere." — Albert Einstein"Money is the most egalitarian force in society. It confers power on whoever holds it." — Roger Starr{General Programming Forum FAQ} | {Blog/Journal} | {[email=kkaitan at gmail dot com]e-mail me[/email]} | {excellent webhosting}
Thanks, but I don't actually have performance problems - it was just something I was curious about.
To win one hundred victories in one hundred battles is not the acme of skill. To subdue the enemy without fighting is the acme of skill.
For Windows programming there's a couple functions QueryPerformanceCounter() and QueryPerformanceFrequency(). I think QueryPerformanceCounter() can tell you how many "ticks" have gone by and QueryPerformanceFrequency() can tell you how much time each "tick" is worth. I think the timer it uses is fine enough to measure instruction cycles, but I'm not sure. Of course, these functions also take time to run and would probably throw off the result a bit.

If I remember this right, there's a high resolution timer(s) on all (or most) processors that incriments itself (almost) every instruction cycle. This timer can be accessed using assembly.
But the question was whether there was a way to *calculate* this, not measure it by running the program... :)

This topic is closed to new replies.

Advertisement