Values from 0.0 to 1.0

Started by
15 comments, last by Puzzler183 20 years, 6 months ago
1. The range between 0 and 1 requires "fixed point", so a comparison with plain "integer" isn't totally fair anyway since although the majority of the maths can be treated as plain integer, at some point, a scale is usually required - the time taken for that occasional bitshift to move the decimal point should be accounted for.


2. Simple loop tests aren't at all accurate for benchmarking (I cache warmup, process/thread context switches in the middle of your loop etc, D cache warm up for the first access to the timer variables etc)...


3. ...Nor are they reprensentative of what the compiler would generate for a real app. In the real app, if the compiler runs out of integer registers before it does FP stack space, the results would skew in favour of the FP code.


4. The MSVC optimiser collapses those loops anyway, so you only get valid results by disabling optimisations of the loop (say by making the iterator volatile) - which results in some pretty sub-optimal pooey code generation.


5. The float constant you were adding is treated as a double.


6. try the following slightly modified (and fairer) version of your code in both release and debug:
#include <stdlib.h>#include <iostream.h>#include <windows.h>int main(){         unsigned int dt;     unsigned int t1 = GetTickCount();         float temp1 = 0.0f;         for (volatile unsigned int i = 0; i < 4200000000U; ++i)         {                  temp1 += .00013f;         }         unsigned int t2 = GetTickCount();         dt = t2 - t1;         cout << "Floating Point-\n   Total Time: " << dt << "ms\n";         cout << "    Average Time: " << dt / 4200000000.0f << "ms\n\n";         unsigned int t3 = GetTickCount();         unsigned int temp2 = 0U;             for (volatile unsigned int j = 0; j < 4200000000U; ++j)         {                  temp2 += 13;         }             unsigned int t4 = GetTickCount();         dt = t4 - t3;         cout << "Integer-\n    Total Time: " << dt << "ms\n";         cout << "    Average Time: " << dt / 4200000000.0f << "ms\n\n";         return 0;}



6a. On the (Intel CPU) machine I'm browsing on, with a DEBUG build I got the following results for two runs:
Floating Point-   Total Time: 60046ms    Average Time: 1.42967e-005msInteger-    Total Time: 12789ms    Average Time: 3.045e-006ms-------------------------------------Floating Point-   Total Time: 59726ms    Average Time: 1.42205e-005msInteger-    Total Time: 12788ms    Average Time: 3.04476e-006ms



6b. Now let's see the (MSVC6 compiler optimised) RELEASE version on the same machine:
Floating Point-   Total Time: 19287ms    Average Time: 4.59214e-006msInteger-    Total Time: 21601ms    Average Time: 5.1431e-006ms----------------------------------------------Floating Point-   Total Time: 19568ms    Average Time: 4.65905e-006msInteger-    Total Time: 21592ms    Average Time: 5.14095e-006ms



6c. See how much the surrounding code can skew your "benchmarking" results...


7. That's not to say MSVC6's optimisation of FP code is ideal - it's not, it misses many good opportunities to use FP latency hiding and uses temporary flushes to memory too often (though that could be for numerical consistency). The integer optimisation code in MSVC6 is better.


8. And no, I've not got any suggestions as to why the release integer code ends up slower than the debug integer code

[BAH! - edit kills the source tags]

[edited by - S1CA on October 12, 2003 7:05:52 PM]

Simon O'Connor | Technical Director (Newcastle) Lockwood Publishing | LinkedIn | Personal site

Advertisement
Use a typedef, make wrapper classes, decide later.

(So you can just use x==y in code, and implement your Float operator== using an epsilon).

[edited by - Magmai Kai Holmlor on October 12, 2003 9:11:57 PM]
- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara
Well actually, I will not be using floats. You see all those optimizations and stuff you take into account are bogus in something that isn't constantly kepts on the floating point stack (which mine won't be). Oh, and it kinda sounds like you are talking down to me; please don't. I know and have used assembly quite a bit; I've just heard different things aobut this and decided to ask. Anyway, neither of our benchmarks is really fair to using fuzzy logic because you never really have it on the FP stack waiting to be used/etc.

And BTW, it wasn't a double; note how it was a float.

EDIT: Just ran your version on my computer; and integers were actually still faster O____o.

[edited by - Puzzler183 on October 12, 2003 9:30:34 PM]
quote:
1. The range between 0 and 1 requires "fixed point", so a comparison with plain "integer" isn''t totally fair anyway since although the majority of the maths can be treated as plain integer, at some point, a scale is usually required - the time taken for that occasional bitshift to move the decimal point should be accounted for.


2. Simple loop tests aren''t at all accurate for benchmarking (I cache warmup, process/thread context switches in the middle of your loop etc, D cache warm up for the first access to the timer variables etc)...


3. ...Nor are they reprensentative of what the compiler would generate for a real app. In the real app, if the compiler runs out of integer registers before it does FP stack space, the results would skew in favour of the FP code.


BTW, your test was subject to these as well...
quote:And BTW, it wasn''t a double; note how it was a float.


temp1 += .000000001; 


Note how you you don''t have an "f" at the end of that constant, THAT was what I was referring to. In response the compiler produces code which fetches a double :

warning C4305: ''+='' : truncation from ''const double'' to ''float''
...
fadd QWORD PTR __real@3e112e0be826d695


quote:You see all those optimizations and stuff you take into account are bogus in something that isn''t constantly kepts on the floating point stack (which mine won''t be)


Likewise with integer code that won''t fit into registers!!. My point there being neither is guaranteed to be faster than the other, and simple loop tests are NOT valid for making performance decisions. That was in response to your post saying:

quote:So as three tests show, with 4.2 billion additions per test, integer addition is faster than floating point addition



quote:Oh, and it kinda sounds like you are talking down to me; please don''t. I know and have used assembly quite a bit;


I wasn''t. Likewise, but I''ve no interest in getting into a pissing contest.


quote:I''ve just heard different things aobut this and decided to ask.


Though TBH, your response comes across as if you''ve already made your mind up and don''t want to listen to what anyone else has to suggest...


quote:Anyway, neither of our benchmarks is really fair to using fuzzy logic because you never really have it on the FP stack waiting to be used/etc.


Exactly. My modification of your benchmark was only to challenge the assertion that your benchmark proves floats were slower than integers. They''re neither slower of faster - it depends on the code where they''re used, and on what the compiler produces.

Simon O'Connor | Technical Director (Newcastle) Lockwood Publishing | LinkedIn | Personal site

frankly, the slowest part for your fuzzy controller is going to be evaluating all the if statements in your rule set anyway [unless your using a complex output generation function]. So considering the ammount of branching youll be doing I wouldnt worry too much about the tiny bits of performance loss from float addition as opposed to int addition.

finally, fuzzy controllers can be executed beautifully in parallel, so in the end, if your really hungry for speed, you would write it all in SSE2 code anyway.

i would be personally quite supprised if you found that your actual fuzzy logic code was making a noticable difference to your game speed.
(assume arch=Athlon XP) Those benchmarks are silly - of course integer is faster. 3 integer units, latency 1 clock vs. 1 FADD pipe, latency 4 clocks. The loops suck because you''re doing back to back dependent adds (limited by mem access time). You can reach 2 int adds/clock throughput (well, ignoring mem access latency) by unrolling, but only 1 float add/clock.
Things change if using 3DNow!, of course
E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3

This topic is closed to new replies.

Advertisement