Interesting profiling result

Started by
8 comments, last by stefu 20 years, 10 months ago
I just gprof''d my game and got interesting result:

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
  9.36      1.09     1.09                             intersect_RayTriangle(vec3 const&, vec3 const&, vec3 const&, vec3 const&, vec3 const&, vec3*, float*)
  7.82      2.00     0.91                             vec3::vec3(float, float, float)
  7.47      2.87     0.87                             TQuadTreeNode::getHeightAt(float, float, float*, vec3*)
  7.22      3.71     0.84                             vec3::operator-(vec3 const&) const
  2.49      4.00     0.29                             vec3::crossProduct(vec3 const&, vec3 const&)
  2.41      4.28     0.28                             Plane::from3Points(vec3 const&, vec3 const&, vec3 const&)
  2.41      4.56     0.28                             
What I am worried about is the amount of time vec3::vec3(float,float,float) constructor uses. That must be because of the way I am using vector: return vec3(x+v.x,y+v.y,z+v.z); The constructor is used very much, and stack too. Almost 10% for vector constructor sounds really bad
Advertisement
Is the constructor inlined?

You could avoid returning vec3''s like that and instead pass in a reference to to a vec3 and fill that in. Its a little ugly I know but it avoids the use of a temporary.
quote:
7.82 2.00 0.91 vec3::vec3(float, float, float)
7.22 3.71 0.84 vec3::operator-(vec3 const&) const


That does seem excessive. Here is what I would look at:

1. Why are you converting from x,y,z to vec3 so much? Stick to one or the other and you won''t need the constructor as much.
2. Inlining will probably help a lot and make sure you aren''t profiling a debug build.
3. Perhaps your algorithm could revised to call these two function less.
John BoltonLocomotive Games (THQ)Current Project: Destroy All Humans (Wii). IN STORES NOW!
i''m not sure if this will help but does your constructor intialize the x/y/z or assign them inside. I''m not the speed increase but i do know intializations is almost always prefered to assignment ie ... vector(float x_val, float y_val, float z_val) : x(x_val), y(y_val), z(z_val)
Which profilers output is this if i may ask ?
Use pointers/references if you need to avoid too many temp variables/copy ctors
I am using gnu g++ and gprof.

Constructor:
class vec3 {
public:
vec3(float _x,float _y,float _z) : x(_x), y(_y), z(_z) {}
(btw. Is the inline required here to make it inline?)
};



"1. Why are you converting from x,y,z to vec3 so much? Stick to one or the other and you won't need the constructor as much."

I use only vec3, not x, y, z at all outside the math library. But the constructor is used in other math functions, for example:

inline vec3 vec3::operator- (const vec3& v) const {
return vec3(x-v.x,y-v.y,z-v.z);
}

"3. Perhaps your algorithm could revised to call these two function less."
Yeah, for example collision testing, which is used heavily, could be written without vec3's at all, just write everything open. That could help much.

It's just so clean to write everything with primitive vec3 operators, but now it falls back in performance.


[edited by - stefu on June 4, 2003 2:17:07 AM]
Some compilers are quicker with this version of the code:
inline vec3 vec3::operator- (const vec3& v) const {vec3 temp(x-v.x,y-v.y,z-v.z);return temp;} 

than your version:
inline vec3 vec3::operator- (const vec3& v) const {return vec3(x-v.x,y-v.y,z-v.z);} 

Basically, the named variable sometimes allows the compiler to construct ''temp'' at the same address as whatever variable will take the result of the function call, thus avoiding an extra copy. The compiler knows you''re expecting a vec3 result, and that you are asking for that vec3 to be put on the stack for the duration of the function, and that in all cases it is the same vec3 that will be returned to the calling function, so why not merge the 2 into 1.

Do this in all your operator functions and maybe you''ll see an improvement in performance.

Refs:
gcc.gnu.org
long link at Informit.com
you might be surprised how many times that constructor is called. It could be in the 10,000s. So optimising it drastically would improve perfs a lot. A vector library should inline as much as possible, and the constructors should be used wisely. As well as the const operators like +, -, /, *, which require temp variables and copies. try replacing them with a number of -=, +=, /=, *=, and call the default constructor (which should do nothing). Combining operators into one functions would help too, like AddScaledVector(float k, const Vector& V), and in there, do the arithmetics directly. By changing the order of the operations inside, you could see some improvements over calling two operators.

If you can see the assembly code of the constructor and the operators, you''ll see why it takes so much time. There must be a lot of push pop and mov that you can avoid. Ultimately, assembly inline it, and use MMX or the latest flavour of the month. Having a fast math library help the whole game run faster.

Everything is better with Metal.

Make sure the compiler options are set correctly so that your inline functions are actually being inlined. Also, make sure you are not profiling the debug version, which generally turns off all inlining options.

I would expect the compiler to optimize
    va = vb - vc; 

into the equivalent of
    va.x = vb.x - vc.x;    va.y = vb.y - vc.y;    va.z = vb.z - vc.z;  


Both vec3::vec3() and vec3::operator-() have been optimized away here, so they should not show up in the profiler.
John BoltonLocomotive Games (THQ)Current Project: Destroy All Humans (Wii). IN STORES NOW!

This topic is closed to new replies.

Advertisement