my concept - the importance of using floating point

Started by
17 comments, last by Ravyne 12 years, 7 months ago
A computer game should take advantage of using floating point numbers because the over use of integer numbers may actually overload the CPU for math processing with the FPU potion can be used for math processing. Feedback please. What do you guys think? Let me start on why. A small computer game will never really see a performance hit but all the new mathemathics and physics used now probably is expected to go to a special module to take care of the processing, but maybe those modules are not really needed because the processors already have a portion of that ability inside of their MMX,SSE, and FPU processing.

thanks.
General Studies A.S - College of Southern Nevada 2003 GPA 2.3
Advertisement
Daddy, I'm cold. Can you build me a campfire?
You are too concerned about trivial things like this. If you are pushing a machine that hard, there are more likely other ways to improve performance. Besides, any sort of gain would be negligible.

Feedback please.



you should rethink your whole strategy dog


Stefano Casillo
TWITTER: [twitter]KunosStefano[/twitter]
AssettoCorsa - netKar PRO - Kunos Simulazioni

MMX and SSE registers can be used for integer processing btw, not just floating points. But your primary concern should be about using the right data format for the right job, as opposed to doing premature optimisations.
Latest project: Sideways Racing on the iPad
This guy is kind of weird.

A computer game should take advantage of using floating point numbers because the over use of integer numbers may actually overload the CPU for math processing with the FPU potion can be used for math processing. Feedback please.


Absolutely correct, however you have forgotten about mixing in the other variable types too. What you should be aiming for, is a healthy mix of bits, bytes, shorts, ints, int64's, floats, doubles and strings (don't forget SSE4 has string methods now!). So the best way to extract the maximum performance is to ensure you always mix different types in your calculations as much as possible. So let's start with a very basic example:

[source]
struct ivec2
{
int x;
int y;
};

ivec2 add(const ivec2& a, const ivec2& b)
{
ivec2 c;
c.x = a.x + b.x; c.y = a.y + b.y;
return c;
}[/source]

Now, a seasoned professional developer will notice immediately that you are overloading the integer portion of the cpu in the add method. The best way to reduce this overhead is to insert some double precision operations in there (the trig functions in math.h are particularly useful for this - doing maths on constant numbers will just get removed by compiler optimisations, so make sure the compiler won't remove any of the code and use variables). Anyhow, here is how a seasoned pro would go about optimising that method. First lets try to balance the heavy integer load on the CPU, with some nice cheap double ops:


[source]ivec2 add(const ivec2& a, const ivec2& b)
{
ivec2 c;
sin( (double)a.x );
c.x = a.x + b.x;
cos( (double)a.y );
c.y = a.y + b.y;
acos( (double)b.x );
return c;

}[/source]

However, since the standard library trig functions are only using the double portions of the CPU, the CPU usage is still very unbalanced. We can simply add in some calls to floating point trig methods here which will help out somewhat! i.e.



[source]ivec2 add(const ivec2& a, const ivec2& b)
{
ivec2 c;
sin( (double)a.x );
sinf( (float)a.y );
c.x = a.x + b.x;
cos( (double)b.x );
sinf( (float)b.y );
c.y = a.y + b.y;
acos( (double)a.x );
acosf( (float)a.y );
return c;
}[/source]

Obviously, we still aren't using any boolean ops at this point, so a really handy trick is to combine the calls in a meaningless comparison. That way the boolean portion of the CPU becomes load balanced too!

[source]ivec2 add(const ivec2& a, const ivec2& b)
{
short hack=0; //< make use of shorts too! Just here so nothing gets optimised away....
ivec2 c;
if( sin( (double)a.x ) < sinf( (float)a.y ) ) //< resolves
hack = 1;
else //< always use an else statement. It makes sure the branch predictor is being load balanced as well as the double portions of the CPU.
hack = 3;
c.x = a.x + b.x;
if( cos( (double)b.x ) < cosf( (float)b.y ) )
hack = 2;
else
hack = 5;
c.y = a.y + b.y;
if( acos( (double)a.x ) < acosf( (float)a.y ) )
hack = 3;
else
hack = 6;
return c;
}[/source]

As a final pass, we can add in some really cheap string ops for the benefit of those people with SSE4.1 capable CPUs. i.e.

[source]ivec2 add(const ivec2& a, const ivec2& b)
{
std::string tempString; //< using std::string everywhere is always a massive performance win. Try it and see!
short hack=0;
ivec2 c;
tempString = "This will ";
if( sin( (double)a.x ) < sinf( (float)a.y ) )
hack = 1;
else
hack = 4;
c.x = a.x + b.x; tempString += " make things ";
if( cos( (double)b.x ) < cosf( (float)b.y ) )
hack = 2;
else
hack = 5;
c.y = a.y + b.y;
tempString += " really really fast! You'll be amazed at the net gain when you run this through a profiler!";
if( acos( (double)a.x ) < acosf( (float)a.y ) )
hack = 3;
else
hack = 6;
return c;

}[/source]

I work in middleware, and I can assure you that we optimise every single method in exactly the same way. The biggest problem with this kind of optimisation though, is that it requires a super experienced programmer to make it work properly. An inexperienced programmer may naively attempt the above optimisations, but end up with something significantly slower than they started with. Normally therefore, it's recommended that you always leave optimisation work for a compiler to do.

A final word of warning: There is so much mis-information and downright lies in internet forums about optimisation techniques, that imho it is best to assume that every forum post is full of mis-information and lies. The only way to know if something is true or not is to verify it in a profiler.....

A final word of warning: There is so much mis-information and downright lies in internet forums about optimisation techniques, that imho it is best to assume that every forum post is full of mis-information and lies. The only way to know if something is true or not is to verify it in a profiler.....

Profiled it. You're right, the final method you proposed gave me a 200x performance boost. Thanks!
[size=2][ I was ninja'd 71 times before I stopped counting a long time ago ] [ f.k.a. MikeTacular ] [ My Blog ] [ SWFer: Gaplessly looped MP3s in your Flash games ]
This is true in as much as its obviously a perf-gain if you can keep all the various execution resources as busy as possible. One of the reasons that Quake, and later, Unreal, were able to do things that no one else was doing in software rendering was that they kept the Pentium's U and V pipe both pretty busy, along with MMX where available.

Conceptually, this is easy, but in practice it is very hard -- it requires a knowledge of instructions and compilers that most people don't have, and with out-of-order execution, wider super-scalar architectures, operation fusion techniques, and different issue/resolve latencies for each instruction, its basically impossible for a human to accomplish at anything more finely-grained than, say, the major execution units: load/store, integer, FPU (simple and complex), and SIMD.

Yes, highly performant code should be aware of these things, but no, this is no epiphany that you've bestowed upon the world.

throw table_exception("(? ???)? ? ???");

After reading RobTheBloke's post I'm a bit confused. I think he is joking but is Ravyne joking? Is there any truth in what GoofProg.F says? It seems like no one takes him serious.

This topic is closed to new replies.

Advertisement