Finding that balance between optimization and legibility.

Started by
18 comments, last by _the_phantom_ 7 years, 9 months ago

So, there is this fairly trivial piece of code that had been bugging me for quite some time, and that I wen't back and forth between a rather black and white implementation and just couldn't find a balance that pleased me, until now. I want to share this insignificant little problem with everyone and solicit some feedback regarding this approach.

The Problem:

So I have my own math library that I've written in c, and within it there is this one particular axis rotation function that gets invoked in my 3D game engine quite often. (This piece of code is essentially the Rodrigues' rotation implemntation, and it involves vectors.) The two implementations of this particular function are as follows.

  1. void v3rot( vec3 v, const vec3 axis, float radians ); //compute cosine & sine values for the given radians every time
  2. void v3rot( vec3 v, const vec3 axis, float cosine, float sine ) //cosine and sine values are computed outside of the call

Usually, when this axis rotation function is invoked, it is called 3 times, once for each of the bases vectors x, y, z; The faster of the two implementation (2nd one) involves me defining float variables and calculating the sine and cosine values once in my my caller routine before invoking this function. However, this clutters my code more than I care for, as I would like my code to be more legible. With the solower implementation ( the first one), my code is less cluttered, and I don't have to track additional variables, but I am having to invoke the math function cosf() and sinf() 3 more times than normal, in mose cases:

My Solution:

Cache it. Create a function within my library called "get_sin_cos_cache( float radians, float *cos_val, float *sin_val )", which keeps a running a table of the last 4 or 8 calls to cosf() and sinf(), check against the existing table of stored cosine, sine and radian values, before calculating and updating these tables. So, I continue to use the first implementation in my code and avoid the ugly clutter, but under the hood, in my v3rot() function, i call the get_sin_cos_cache() routine to only calculate sin/cosine values when I need too.

Sauce:

void get_sin_cos_cache( float rads, float *si, float *co )
{
static int idx = -1;
static float rads_cache[8], cosf_cache[8], sinf_cache[8];
int i;
if( -1 == idx )
{
rads_cache[0] = 0.0f;
cosf_cache[0] = cosf( rads_cache[0] );
sinf_cache[0] = sinf( rads_cache[0] );
for( i = 1; i < 8; i++ )
{
rads_cache = rads_cache[0];
cosf_cache = cosf_cache[0];
sinf_cache = sinf_cache[0];
}
idx = 0;
}
for( i = 0; i < 8; i++ )
if( rads == rads_cache )
break;
if( i == 8 )
{
i = idx;
rads_cache = rads;
cosf_cache = cosf( rads );
sinf_cache = sinf( rads );
idx = ( idx + 1 ) % 8;
}
if( co )
*co = cosf_cache;
if( si )
*si = sinf_cache;
}

Conclusion:

After crunching some numbers, I calculated that my cache hit was approximately 70%, so I'm thinking about keeping this example in my code for now (if not for the performance gain, then at the very least as a reminder).

I know some (most) of you are thinking "...and? is that it?", and I get it, this is a very trivial example (some may even argue that the gains are not non-existent), but I feel like it serves as one example (good or bad) of how to code with a particular mindset, expressing awareness of what is exactly is going on in your code, not always keeping a back and white approach to what you are doing, rather be creative and look for the "good enough" approach to problem solving. I also understand that veteran programmers are already doing this on a much greater scale than this, and with greater intuition than I'm exhibiting. Perhaps this example could serve well for beginners, either way I would like to hear what y'all have to say.

Advertisement

After crunching some numbers, I calculated that my cache hit was approximately 70%, so I'm thinking about keeping this example in my code for now (if not for the performance gain, then at the very least as a reminder).

I know some (most) of you are thinking "...and? is that it?", and I get it, this is a very trivial example (some may even argue that the gains are not non-existent), but I feel like it serves as one example (good or bad) of how to code with a particular mindset, expressing awareness of what is exactly is going on in your code, not always keeping a back and white approach to what you are doing, rather be creative and look for the "good enough" approach to problem solving. I also understand that veteran programmers are already doing this on a much greater scale than this, and with greater intuition than I'm exhibiting. Perhaps this example could serve well for beginners, either way I would like to hear what y'all have to say.

First, how much faster did your program get because of this? That's how you measure performance gain, not cache hits.

Second, no, not many veteran programmers I know appreciate functions that look like they should be pure functions, and unexpectedly read and write from static variables. These "creative" solutions are very annoying when you finally figure out what's causing the rare animation bug in a multi-threaded application. I hope no beginners copy this approach. If you want to make an optimization, hacks like this should be the last resort, ideally only made on a "release" version of your game, not on code you would continue to use in future products. You should be willing to refactor your code and actually change where the function calls are made, if the performance gain is meaningful.

So regardless of how you calculated what your cache hit rate is supposed to be, did you actually measure it?

Second, your function has whole bunch of if-conditions, some of which are easy to predict (idx == -1), and other that aren't.


for( i = 0; i < 8; i++ )
    if( rads == rads_cache[i] )
      break;
  if( i == 8 )
  {
    i = idx;
    rads_cache[i] = rads;
    cosf_cache[i] = cosf( rads );
    sinf_cache[i] = sinf( rads );
    idx = ( idx + 1 ) % 8;
  }

While you need to measure this, at least "if( rads == rads_cache )" is going to be very hard if not impossible to predict. (the loop might be able to unroll and if not, be easier to predict but still no quarantee, as well as for the i == 8). So if your use case is really "cos/sin" being called exactly 3 times, than there has got to be a cleaner solution. If its really just random calls which sometimes share the same value, having an unpredictable branch will cost you more than calling sinf/cosf multiple times. As always, make sure to actually profile, but in general, the times where singular trigonometric functions are worth caching are over (especially at the cost of a branch). If you can cache a whole set of calcualtions/trig functions, sure, but if its just 3x sinf+cos vs. sinf+cos+branch, I'd say generally take the firmer (and don't forget to actually measure, to see what is actually faster, let alone if this even makes a measurable difference).

EDIT: I think I might have misunderstood this completely. Ignore the following :)

Usually, when this axis rotation function is invoked, it is called 3 times, once for each of the bases vectors x, y, z; The faster of the two implementation (2nd one) involves me defining float variables and calculating the sine and cosine values once in my my caller routine before invoking this function. However, this clutters my code more than I care for, as I would like my code to be more legible. With the solower implementation ( the first one), my code is less cluttered, and I don't have to track additional variables, but I am having to invoke the math function cosf() and sinf() 3 more times than normal, in mose cases:

Create a function that does that for you, then.

Instead of


float sin = sinf(rads);
float cos = cosf(rads);
v3rot(v, vx, cos, sin); //first axis
v3rot(v, vy, cos, sin); //second axis
v3rot(v, vz, cos, sin); //third axis

Just do


myMoreEfficientRot(v, vx, vy, vz, rads);

Where that function is defined as:


void myMoreEfficientRot(v, vx, vy, vz, rads)
{
    float sin = sinf(rads);
    float cos = cosf(rads);
    v3rot(v, vx, cos, sin); //first axis
    v3rot(v, vy, cos, sin); //second axis
    v3rot(v, vz, cos, sin); //third axis
}

Or am I missing something?

Hello to all my stalkers.

After crunching some numbers, I calculated that my cache hit was approximately 70%, so I'm thinking about keeping this example in my code for now (if not for the performance gain, then at the very least as a reminder).

I know some (most) of you are thinking "...and? is that it?", and I get it, this is a very trivial example (some may even argue that the gains are not non-existent), but I feel like it serves as one example (good or bad) of how to code with a particular mindset, expressing awareness of what is exactly is going on in your code, not always keeping a back and white approach to what you are doing, rather be creative and look for the "good enough" approach to problem solving. I also understand that veteran programmers are already doing this on a much greater scale than this, and with greater intuition than I'm exhibiting. Perhaps this example could serve well for beginners, either way I would like to hear what y'all have to say.


First, how much faster did your program get because of this? That's how you measure performance gain, not cache hits.
Second, no, not many veteran programmers I know appreciate functions that look like they should be pure functions, and unexpectedly read and write from static variables. These "creative" solutions are very annoying when you finally figure out what's causing the rare animation bug in a multi-threaded application. I hope no beginners copy this approach. If you want to make an optimization, hacks like this should be the last resort, ideally only made on a "release" version of your game, not on code you would continue to use in future products. You should be willing to refactor your code and actually change where the function calls are made, if the performance gain is meaningful.

Thank you for this solid feedback. You've highlighted key problems and pitfalls inherent to this approach. So while I was using this example not to highlight "cache gains", I wanted to bring the conversation to the table "I need to improve an aspect and of a program As cleanly as possible".

I think beginners can definitely stand to benefit from you comment and experience, especially the point regarding multi threading.

I also second that hacks like these absolutely must be a last resort. The current engine/game that I am working on, my entire objective is NOT to write fast code, but th right clean, legible code. Even my collision detection system operates in O(n^2) complexity, and if I reach my target performance upon completion, it will remain that way.

Do you think in your experience, this approach has merit when performed within the parameters that you have listed?

EDIT: I think I might have misunderstood this completely. Ignore the following :)



Usually, when this axis rotation function is invoked, it is called 3 times, once for each of the bases vectors x, y, z; The faster of the two implementation (2nd one) involves me defining float variables and calculating the sine and cosine values once in my my caller routine before invoking this function. However, this clutters my code more than I care for, as I would like my code to be more legible. With the solower implementation ( the first one), my code is less cluttered, and I don't have to track additional variables, but I am having to invoke the math function cosf() and sinf() 3 more times than normal, in mose cases:

Create a function that does that for you, then.

Instead of
float sin = sinf(rads);float cos = cosf(rads);v3rot(v, vx, cos, sin); //first axisv3rot(v, vy, cos, sin); //second axisv3rot(v, vz, cos, sin); //third axis
Just do
myMoreEfficientRot(v, vx, vy, vz, rads);
Where that function is defined as:
void myMoreEfficientRot(v, vx, vy, vz, rads){    float sin = sinf(rads);    float cos = cosf(rads);    v3rot(v, vx, cos, sin); //first axis    v3rot(v, vy, cos, sin); //second axis    v3rot(v, vz, cos, sin); //third axis}
Or am I missing something?
essentially yes, but I am rotating the axes them selves around a vector (such as the angular velocity vector). So it would be more like:

v3copy( vaxis, ang_vel );
float dtheta = v3norm( vaxis ) * dt;
v3rot( body.vlook, vaxis, dtheta );
v3rot( body.vup, vaxis, dtheta );
V3rot( body.vright, vaxis, dtheta );

What I want to avoid is changing all instances of this code to:
float co = cosf( dtheta );
float si = sinf( dtheta );
foat dtheta = v3norm( vaxis ) * dt;
v3rot( body.vlook, vaxis, co, si );
v3rot( body.vup, vaxis, co, so );
V3rot( body.vright, vaxis, co, si );

Or as you are suggest is to something like:

body_rotate( &body, vaxis, dtheta );

And that would work. But now the footprint for this change is much heavier.

By optimizing the rotate function, I've isolated the change to just one spot, thus minimize both my change and testing. And while my performance improvement isn't as good, it is still better.
You should have a class that represents rotations. Whether it's implemented as three axis rotations, as quaternions or as something else should be fairly opaque to most of the code.

The "footprint" of the change shouldn't matter too much. Don't be lazy.

You should have a class that represents rotations. Whether it's implemented as three axis rotations, as quaternions or as something else should be fairly opaque to most of the code.The "footprint" of the change shouldn't matter too much. Don't be lazy.


http://number-none.com/blow/john_carmack_on_inlined_code.html

I wonder what would Michael Abrash say...
I have no idea what Carmack or Abrash have to do with this thread. I also don't understand what those posts from Carmack have to do with my recommendation.

The supposed need to reach a balance between code speed and code readability is largely false. If you write clean code with a bit of common sense, chances are it will be fast enough.

Using quaternions or 3x3 orthogonal matrices to represent rotations, for instance, would get rid of practically every trigonometric function call in your code and make it run faster. And that's while making the code easier to read.

Number of times code will be executed: 100,000

Time saved per execution: 3ms

Total time saved: 300 seconds

Time spent with hand in shorts deciding how to save the most time: 1 hour

void hurrrrrrrr() {__asm sub [ebp+4],5;}

There are ten kinds of people in this world: those who understand binary and those who don't.

This topic is closed to new replies.

Advertisement