Finding that balance between optimization and legibility.

This topic is 726 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

So, there is this fairly trivial piece of code that had been bugging me for quite some time, and that I wen't back and forth between a rather black and white implementation and just couldn't find a balance that pleased me, until now. I  want to share this insignificant little problem with everyone and solicit some feedback regarding this approach.

The Problem:

So I have my own math library that I've written in c, and within it there is this one particular axis rotation function that gets invoked in my 3D game engine quite often. (This piece of code is essentially the Rodrigues' rotation implemntation, and it involves vectors.)  The two implementations of this particular function are as follows.

1. void v3rot( vec3 v, const vec3 axis, float radians ); //compute cosine & sine values for the given radians every time
2. void v3rot( vec3 v, const vec3 axis, float cosine, float sine ) //cosine and sine values are computed outside of the call

Usually, when this axis rotation function is invoked, it is called 3 times, once for each of the bases vectors x, y, z; The faster of the two implementation (2nd one) involves me defining float variables and calculating the sine and cosine values once in my my caller routine before invoking this function. However, this clutters my code more than I care for, as I would like my code to be more legible. With the solower implementation ( the first one), my code is less cluttered, and I don't have to track additional variables, but I am having to invoke the math function cosf() and sinf() 3 more times than normal, in mose cases:

My Solution:

Cache it. Create a function within my library called "get_sin_cos_cache( float radians, float *cos_val, float *sin_val )", which keeps a running a table of the last 4 or 8 calls to cosf() and sinf(), check against the existing table of stored cosine, sine and radian values, before calculating and updating these tables. So, I continue to use the first implementation in my code and avoid the ugly clutter, but under the hood, in my v3rot() function, i call the get_sin_cos_cache() routine to only calculate sin/cosine values when I need too.

Sauce:

void get_sin_cos_cache( float rads, float *si, float *co )
{
static int   idx = -1;
int          i;

if( -1 == idx )
{
for( i = 1; i < 8; i++ )
{
cosf_cache = cosf_cache[0];
sinf_cache = sinf_cache[0];
}
idx = 0;
}

for( i = 0; i < 8; i++ )
break;
if( i == 8 )
{
i = idx;
idx = ( idx + 1 ) % 8;
}

if( co )
*co = cosf_cache;
if( si )
*si = sinf_cache;
}

Conclusion:

After crunching some numbers, I calculated that my cache hit was approximately 70%, so I'm thinking about keeping this example in my code for now (if not for the performance gain, then at the very least as a reminder).

I know some (most) of you are thinking "...and? is that it?", and I get it, this is a very trivial example (some may even argue that the gains are not non-existent), but I feel like it serves as one example (good or bad) of how to code with a particular mindset, expressing awareness of what is exactly is going on in your code, not always keeping a back and white approach to what you are doing, rather be creative and look for the "good enough" approach to problem solving. I also understand that veteran programmers are already doing this on a much greater scale than this, and with greater intuition than I'm exhibiting. Perhaps this example could serve well for beginners, either way I would like to hear what y'all have to say.

Share on other sites

After crunching some numbers, I calculated that my cache hit was approximately 70%, so I'm thinking about keeping this example in my code for now (if not for the performance gain, then at the very least as a reminder).

I know some (most) of you are thinking "...and? is that it?", and I get it, this is a very trivial example (some may even argue that the gains are not non-existent), but I feel like it serves as one example (good or bad) of how to code with a particular mindset, expressing awareness of what is exactly is going on in your code, not always keeping a back and white approach to what you are doing, rather be creative and look for the "good enough" approach to problem solving. I also understand that veteran programmers are already doing this on a much greater scale than this, and with greater intuition than I'm exhibiting. Perhaps this example could serve well for beginners, either way I would like to hear what y'all have to say.

First, how much faster did your program get because of this? That's how you measure performance gain, not cache hits.
Second, no, not many veteran programmers I know appreciate functions that look like they should be pure functions, and unexpectedly read and write from static variables. These "creative" solutions are very annoying when you finally figure out what's causing the rare animation bug in a multi-threaded application. I hope no beginners copy this approach. If you want to make an optimization, hacks like this should be the last resort, ideally only made on a "release" version of your game, not on code you would continue to use in future products. You should be willing to refactor your code and actually change where the function calls are made, if the performance gain is meaningful.

Thank you for this solid feedback. You've highlighted key problems and pitfalls inherent to this approach. So while I was using this example not to highlight "cache gains", I wanted to bring the conversation to the table "I need to improve an aspect and of a program As cleanly as possible".

I think beginners can definitely stand to benefit from you comment and experience, especially the point regarding multi threading.

I also second that hacks like these absolutely must be a last resort. The current engine/game that I am working on, my entire objective is NOT to write fast code, but th right clean, legible code. Even my collision detection system operates in O(n^2) complexity, and if I reach my target performance upon completion, it will remain that way.

Do you think in your experience, this approach has merit when performed within the parameters that you have listed?

Share on other sites

EDIT: I think I might have misunderstood this completely. Ignore the following :)

Usually, when this axis rotation function is invoked, it is called 3 times, once for each of the bases vectors x, y, z; The faster of the two implementation (2nd one) involves me defining float variables and calculating the sine and cosine values once in my my caller routine before invoking this function. However, this clutters my code more than I care for, as I would like my code to be more legible. With the solower implementation ( the first one), my code is less cluttered, and I don't have to track additional variables, but I am having to invoke the math function cosf() and sinf() 3 more times than normal, in mose cases:

Create a function that does that for you, then.

float sin = sinf(rads);float cos = cosf(rads);v3rot(v, vx, cos, sin); //first axisv3rot(v, vy, cos, sin); //second axisv3rot(v, vz, cos, sin); //third axis
Just do
myMoreEfficientRot(v, vx, vy, vz, rads);
Where that function is defined as:
void myMoreEfficientRot(v, vx, vy, vz, rads){    float sin = sinf(rads);    float cos = cosf(rads);    v3rot(v, vx, cos, sin); //first axis    v3rot(v, vy, cos, sin); //second axis    v3rot(v, vz, cos, sin); //third axis}
Or am I missing something?
essentially yes, but I am rotating the axes them selves around a vector (such as the angular velocity vector). So it would be more like:

v3copy( vaxis, ang_vel );
float dtheta = v3norm( vaxis ) * dt;
v3rot( body.vlook, vaxis, dtheta );
v3rot( body.vup, vaxis, dtheta );
V3rot( body.vright, vaxis, dtheta );

What I want to avoid is changing all instances of this code to:
float co = cosf( dtheta );
float si = sinf( dtheta );
foat dtheta = v3norm( vaxis ) * dt;
v3rot( body.vlook, vaxis, co, si );
v3rot( body.vup, vaxis, co, so );
V3rot( body.vright, vaxis, co, si );

Or as you are suggest is to something like:

body_rotate( &body, vaxis, dtheta );

And that would work. But now the footprint for this change is much heavier.

By optimizing the rotate function, I've isolated the change to just one spot, thus minimize both my change and testing. And while my performance improvement isn't as good, it is still better.

Share on other sites
You should have a class that represents rotations. Whether it's implemented as three axis rotations, as quaternions or as something else should be fairly opaque to most of the code.

The "footprint" of the change shouldn't matter too much. Don't be lazy.

Share on other sites

You should have a class that represents rotations. Whether it's implemented as three axis rotations, as quaternions or as something else should be fairly opaque to most of the code.The "footprint" of the change shouldn't matter too much. Don't be lazy.

http://number-none.com/blow/john_carmack_on_inlined_code.html

I wonder what would Michael Abrash say...

Share on other sites

Number of times code will be executed: 100,000

Time saved per execution: 3ms

Total time saved: 300 seconds

Time spent with hand in shorts deciding how to save the most time: 1 hour

1. 1
Rutin
19
2. 2
3. 3
JoeJ
16
4. 4
5. 5

• 26
• 20
• 13
• 13
• 17
• Forum Statistics

• Total Topics
631700
• Total Posts
3001781
×