Are you saying it would still be inaccurate at nanosecond level?
Again, consider the modern CPU architecture.
First, let's consider the number of stages inside a CPU: The early x86-based processors only had a single stage. You could look up in the processor manual how many cycles each instruction could take. Then around the 486 timeline, a pipeline was introduced to the x86 series so some operations could be completed in parallel with other operations. Simple integer comparisons are much faster than multiplication or division, so there was no point in delaying one when it didn't impact the other. The Pentium III had 10 stages in its pipeline. The Willamette core had 20. Prescott had 31 stages. Etc.
Let's look at just a few of those stages.
First the instruction must be decoded. The CPU's decoder can convert multiple instructions into micro-operations, exactly how many varies depending on the CPU in use. Some instructions are big, and the CPU can decode only a single instruction per clock cycle. Other instructions, such as the simple integer logical operation "a<b" you presented, are small; Aligned correctly your CPU may decode one, two, three, four, or maybe eight instructions at once; this will vary based on the CPU core and what happens to be next to it in the compiled code.
Addresses are calculated, registers are processed, data must be marked as dirty, etc. All of this takes a different number of CPU cycles depending on the processor microarchitecture and the code that happens to be around it.
Eventually all those micro-operations make their way to the out-of-order core. The OOO core can reorder operations and will perform the operation as soon as possible. Again, some operations are fast --- many operations take a fraction of a CPU cycle; perhaps it will perform three or four comparison micro-ops in a single cycle, where a more complex division micro-op may take five or ten cycles. Each CPU Core actually has a half dozen or more (depending on the microarchitecture) processors inside it. For example it may have 3 integer add/subtract/compare processors each capable of 4 micro-ops per cycle, 4 general purpose integer processors, and 3 FPU/SIMD processors, all of them are pulling from the same OOO core. A microarchitecture may be able to perform 10 or 20 or 30 or even 50+ micro-operations per clock cycle, all depending on the CPU version, what is adjacent to the operation in question, and whatever else is sitting inside the OOO core.
Then those micro-ops must be put back in order and retired. Again this duration will take varying amount of time based on the microarchitecture.
If your single comparison operation happened to be next to a bunch of other lightweight instructions, it may pass through the pipeline in 10 or 20 CPU cycles. So assuming a 3.4 GHz processor, that might be around 5ns. Or if it happens to be next to several instruction branches and floating point math and other heavy instructions, it may pass through the pipeline in 100 CPU cycles, that may be around 30ns. Note that your individual comparison operation didn't change, only the adjacent code, but it dramatically changed the time it took to leave the CPU.
Considering all of that, the time required for a single integer compare operation will come out as just random noise. The actual time it takes is so vanishingly small that measurement is useless.
Edited by frob, 05 October 2012 - 09:31 AM.