Archived

This topic is now archived and is closed to further replies.

Interesting profiling result

This topic is 5310 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I just gprof''d my game and got interesting result:
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
  9.36      1.09     1.09                             intersect_RayTriangle(vec3 const&, vec3 const&, vec3 const&, vec3 const&, vec3 const&, vec3*, float*)
  7.82      2.00     0.91                             vec3::vec3(float, float, float)
  7.47      2.87     0.87                             TQuadTreeNode::getHeightAt(float, float, float*, vec3*)
  7.22      3.71     0.84                             vec3::operator-(vec3 const&) const
  2.49      4.00     0.29                             vec3::crossProduct(vec3 const&, vec3 const&)
  2.41      4.28     0.28                             Plane::from3Points(vec3 const&, vec3 const&, vec3 const&)
  2.41      4.56     0.28                             
What I am worried about is the amount of time vec3::vec3(float,float,float) constructor uses. That must be because of the way I am using vector: return vec3(x+v.x,y+v.y,z+v.z); The constructor is used very much, and stack too. Almost 10% for vector constructor sounds really bad

Share this post


Link to post
Share on other sites
Is the constructor inlined?

You could avoid returning vec3''s like that and instead pass in a reference to to a vec3 and fill that in. Its a little ugly I know but it avoids the use of a temporary.

Share this post


Link to post
Share on other sites
quote:

7.82 2.00 0.91 vec3::vec3(float, float, float)
7.22 3.71 0.84 vec3::operator-(vec3 const&) const



That does seem excessive. Here is what I would look at:

1. Why are you converting from x,y,z to vec3 so much? Stick to one or the other and you won''t need the constructor as much.
2. Inlining will probably help a lot and make sure you aren''t profiling a debug build.
3. Perhaps your algorithm could revised to call these two function less.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
i''m not sure if this will help but does your constructor intialize the x/y/z or assign them inside. I''m not the speed increase but i do know intializations is almost always prefered to assignment ie ... vector(float x_val, float y_val, float z_val) : x(x_val), y(y_val), z(z_val)

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Which profilers output is this if i may ask ?

Share this post


Link to post
Share on other sites
I am using gnu g++ and gprof.

Constructor:
class vec3 {
public:
vec3(float _x,float _y,float _z) : x(_x), y(_y), z(_z) {}
(btw. Is the inline required here to make it inline?)
};



"1. Why are you converting from x,y,z to vec3 so much? Stick to one or the other and you won't need the constructor as much."

I use only vec3, not x, y, z at all outside the math library. But the constructor is used in other math functions, for example:

inline vec3 vec3::operator- (const vec3& v) const {
return vec3(x-v.x,y-v.y,z-v.z);
}

"3. Perhaps your algorithm could revised to call these two function less."
Yeah, for example collision testing, which is used heavily, could be written without vec3's at all, just write everything open. That could help much.

It's just so clean to write everything with primitive vec3 operators, but now it falls back in performance.


[edited by - stefu on June 4, 2003 2:17:07 AM]

Share this post


Link to post
Share on other sites
Some compilers are quicker with this version of the code:

inline vec3 vec3::operator- (const vec3& v) const {
vec3 temp(x-v.x,y-v.y,z-v.z);
return temp;
}

than your version:

inline vec3 vec3::operator- (const vec3& v) const {
return vec3(x-v.x,y-v.y,z-v.z);
}

Basically, the named variable sometimes allows the compiler to construct ''temp'' at the same address as whatever variable will take the result of the function call, thus avoiding an extra copy. The compiler knows you''re expecting a vec3 result, and that you are asking for that vec3 to be put on the stack for the duration of the function, and that in all cases it is the same vec3 that will be returned to the calling function, so why not merge the 2 into 1.

Do this in all your operator functions and maybe you''ll see an improvement in performance.

Refs:
gcc.gnu.org
long link at Informit.com

Share this post


Link to post
Share on other sites
you might be surprised how many times that constructor is called. It could be in the 10,000s. So optimising it drastically would improve perfs a lot. A vector library should inline as much as possible, and the constructors should be used wisely. As well as the const operators like +, -, /, *, which require temp variables and copies. try replacing them with a number of -=, +=, /=, *=, and call the default constructor (which should do nothing). Combining operators into one functions would help too, like AddScaledVector(float k, const Vector& V), and in there, do the arithmetics directly. By changing the order of the operations inside, you could see some improvements over calling two operators.

If you can see the assembly code of the constructor and the operators, you''ll see why it takes so much time. There must be a lot of push pop and mov that you can avoid. Ultimately, assembly inline it, and use MMX or the latest flavour of the month. Having a fast math library help the whole game run faster.

Share this post


Link to post
Share on other sites
Make sure the compiler options are set correctly so that your inline functions are actually being inlined. Also, make sure you are not profiling the debug version, which generally turns off all inlining options.

I would expect the compiler to optimize
    va = vb - vc; 

into the equivalent of
    va.x = vb.x - vc.x;
va.y = vb.y - vc.y;
va.z = vb.z - vc.z;


Both vec3::vec3() and vec3::operator-() have been optimized away here, so they should not show up in the profiler.

Share this post


Link to post
Share on other sites