Quote:Original post by swiftcoderYou partially missed my point, which was along the same lines as agi_shi's point. Since the compiler can immediately reduce float maxVal = maxDefine(0.1f, 0.2f); to float maxVal = 0.2f;, it can then deduce that the loop doesn't do anything, and remove it entirely.
This means that you are comparing zero invocations of the maxDefine against one million invocations of std::max, which tells you nothing - of course zero calls is faster than many calls! However, if your loop does something non-trivial, it may not be optimised out, at which point you can check the relative performance. You really have to check the resulting assembly code to make sure that your loops haven't disappeared completely.
I'd checked the assembly and as far as I can tell everything for the loops is still there and the timing difference also supports the fact that it still runs the entire loop so that's not the problem. Also increasing the number of loops by 10 times made the results take 10 times longer.
Quote:Original post by Spoonbender
And what does the function version get compiled to?
I'd expect that to be the same. The compiler should be able to optimize everything down to a single assignment in both the macro and function cases. (Although your timing results hint that that's probably not the case)
The call to Math::Max disassembly:
float maxVal = Math::Max(0.01f, 0.02f);0040D652 fld dword ptr [__real@3ca3d70a (40F958h)] 0040D658 fstp dword ptr [ebp-1BCh] 0040D65E fld dword ptr [__real@3c23d70a (40F954h)] 0040D664 fstp dword ptr [ebp-1C0h] 0040D66A lea eax,[ebp-1BCh] 0040D670 push eax 0040D671 lea ecx,[ebp-1C0h] 0040D677 push ecx 0040D678 call Math::Max (401040h) 0040D67D add esp,8 0040D680 fstp dword ptr [maxVal]
And the Math::Max function disassembly:
inline float Max(const float& value1, const float& value2){return ((value1 > value2) ? value1 : value2);}00401040 push ebp 00401041 mov ebp,esp 00401043 push ecx 00401044 mov eax,dword ptr [value1] 00401047 fld dword ptr [eax] 00401049 mov ecx,dword ptr [value2] 0040104C fld dword ptr [ecx] 0040104E fcompp 00401050 fnstsw ax 00401052 test ah,5 00401055 jp Math::Max+21h (401061h) 00401057 mov edx,dword ptr [value1] 0040105A fld dword ptr [edx] 0040105C fstp dword ptr [ebp-4] 0040105F jmp Math::Max+29h (401069h) 00401061 mov eax,dword ptr [value2] 00401064 fld dword ptr [eax] 00401066 fstp dword ptr [ebp-4] 00401069 fld dword ptr [ebp-4] 0040106C mov esp,ebp 0040106E pop ebp 0040106F ret
Alot of extra work by the look of it, a far cry from the 2 instructions the macro did :(. The fact that it's doing a call is rather confusing though, as I've said I'm not very familiar with assembly so I'm not sure if it's supposed to be doing that if it inlines the function.
Quote:Original post by Spoonbender
Also, what is the time unit it prints out? 0.00483 seconds?
Yes the unit of the results is in seconds. For reference I'm running an Intel Q9550 (45nm core 2 quad core) overclocked to 3.4ghz (8.5 x 400mhz).
Quote:Original post by Spoonbender
As said above, your test isn't worth much though, because the entire loop can be optimized away in both cases. You're not testing a million max calls, you're testing *one*.
Answered in response to swiftcoder.
Quote:Original post by Spoonbender
Apart from that, I can think of two possible sources for the slowdown in the function case.
One is that you're passing the arguments by reference, which is generally a waste of time with small POD datatypes. (On the other hand, I'd expect the compiler to be able to optimize that away in such a simple function), and the second might be the floating-point precision which causes extra float<->double casts in the function case.
That should be visible if you take a look at the assembly output though.
I'm not too familiar with assembly, I posted it earlier in the post though if you want to take a look. And I also tried passing by value instead of by reference but that yielded slightly worse performance, though it was insignificant enough that I can't be certain it's not just normal speed fluctuation from background processes.