• Advertisement
Sign in to follow this  

How to measure the time it takes to evalueate an expression

This topic is 1935 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

say I have a boolean expression

a<b

How can I measure the time it takes for my computer to evaluate a single expression? Edited by lride

Share this post


Link to post
Share on other sites
Advertisement
Would the result be inaccurate because it is such a small difference?
And how can I obtain the true time it takes to evaluate just the expression, not other things such as assigning the value to a variable

Normally when I just code [font=courier new,courier,monospace]1==2;[/font] compilers see the code as not effective and discards it. Edited by lride

Share this post


Link to post
Share on other sites

Would the result be inaccurate because it is such a small difference?


There are high precision timers (on windows they are called "Multimedia Timers") that can give you i believe at the microsecond level (might even be nanosecond though at that level it probably would be inaccurate). Why do you need such a precise timing for only a single operation? If you are trying to figure out the clock speed of the user's processor there are better ways.

Share this post


Link to post
Share on other sites
I'm doing an experiment.


[quote name='lride' timestamp='1349398907' post='4986974']
Would the result be inaccurate because it is such a small difference?


There are high precision timers (on windows they are called "Multimedia Timers") that can give you i believe at the microsecond level (might even be nanosecond though at that level it probably would be inaccurate). Why do you need such a precise timing for only a single operation? If you are trying to figure out the clock speed of the user's processor there are better ways.
[/quote]

Are you saying it would still be inaccurate at nanosecond level?

Share this post


Link to post
Share on other sites

Are you saying it would still be inaccurate at nanosecond level?

Note that on the nanosecond level, even calling your timing functions will show noise. Just calling QueryPerformanceCounter, storing it in a variable/register, and then calling it again will take some time. So you have to remember that the time you get back includes your operation and your call to QueryPerformanceCounter.

To help minimize this inaccuracy, you'll want to repeat your operation until it takes a significant amount of time compared to the noise introduced by your timing calls. But like alvaro said, modern CPUs are incredibly complicated, and the timings you get back in your testing setup may not match the average results in a real application.

To be more accurate, you'd read the generated assembly, look up your specs on your CPU and calculate by hand how much time an average expression to evaluate. But even then there are all sorts of things that the CPU can do that will make your calculated time just a guideline and not a fixed absolute.

So I guess it all depends on how accurate you want to be.

Share this post


Link to post
Share on other sites

Are you saying it would still be inaccurate at nanosecond level?

Yes.


Again, consider the modern CPU architecture.

First, let's consider the number of stages inside a CPU: The early x86-based processors only had a single stage. You could look up in the processor manual how many cycles each instruction could take. Then around the 486 timeline, a pipeline was introduced to the x86 series so some operations could be completed in parallel with other operations. Simple integer comparisons are much faster than multiplication or division, so there was no point in delaying one when it didn't impact the other. The Pentium III had 10 stages in its pipeline. The Willamette core had 20. Prescott had 31 stages. Etc.

Let's look at just a few of those stages.

First the instruction must be decoded. The CPU's decoder can convert multiple instructions into micro-operations, exactly how many varies depending on the CPU in use. Some instructions are big, and the CPU can decode only a single instruction per clock cycle. Other instructions, such as the simple integer logical operation "a<b" you presented, are small; Aligned correctly your CPU may decode one, two, three, four, or maybe eight instructions at once; this will vary based on the CPU core and what happens to be next to it in the compiled code.

Addresses are calculated, registers are processed, data must be marked as dirty, etc. All of this takes a different number of CPU cycles depending on the processor microarchitecture and the code that happens to be around it.

Eventually all those micro-operations make their way to the out-of-order core. The OOO core can reorder operations and will perform the operation as soon as possible. Again, some operations are fast --- many operations take a fraction of a CPU cycle; perhaps it will perform three or four comparison micro-ops in a single cycle, where a more complex division micro-op may take five or ten cycles. Each CPU Core actually has a half dozen or more (depending on the microarchitecture) processors inside it. For example it may have 3 integer add/subtract/compare processors each capable of 4 micro-ops per cycle, 4 general purpose integer processors, and 3 FPU/SIMD processors, all of them are pulling from the same OOO core. A microarchitecture may be able to perform 10 or 20 or 30 or even 50+ micro-operations per clock cycle, all depending on the CPU version, what is adjacent to the operation in question, and whatever else is sitting inside the OOO core.

Then those micro-ops must be put back in order and retired. Again this duration will take varying amount of time based on the microarchitecture.


If your single comparison operation happened to be next to a bunch of other lightweight instructions, it may pass through the pipeline in 10 or 20 CPU cycles. So assuming a 3.4 GHz processor, that might be around 5ns. Or if it happens to be next to several instruction branches and floating point math and other heavy instructions, it may pass through the pipeline in 100 CPU cycles, that may be around 30ns. Note that your individual comparison operation didn't change, only the adjacent code, but it dramatically changed the time it took to leave the CPU.



Considering all of that, the time required for a single integer compare operation will come out as just random noise. The actual time it takes is so vanishingly small that measurement is useless. Edited by frob

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement