glPushMatrix performance

Graphics and GPU Programming Programming

Started by RegularKid November 11, 2008 08:25 PM

14 comments, last by V-man 15 years, 5 months ago

139

Author

November 11, 2008 08:25 PM

Hi! I have a bunch of 2d objects to draw....say a bunch of circles and a bunch of rectangles. Currently each shape is it's own class. I blow through the list of all my shapes and call "draw()" on each one. The draw does something like this:


glPushMatrix();
glTranslatef( x, y );
glScalef( xScale, yScale );
glRotatef( rot, 0.0f, 0.0f, 1.0f );

// Draw all the verts in my shape using local coordinates about the origin
// Draw all my child shapes too!

glPopMatrix();

The reason for the push / pop transformations is that all of my shapes live under a parent that has it's own translation / scale / rotation....which lives under yet another parent with it's own transformation info ( just a tree of parent -> child relationships ). So, this was the easiest way to get things to render properly. I'm wondering though if it might be more efficient to get rid of the pushing and popping for the transformation and simply calculate each shape vertex position based on the parents transformation ( and of course it's parents transformation ). So, basically doing the math for each vertex rather than using push / pop and specifying my vertices using local coordinates. Which way would give better performance? Thanks!

jsderon

124

November 11, 2008 09:20 PM

Yes, the push, pop, translate, scale, and rotate commands are performance killers. Calculate and keep track of the matrices in your program. Call glLoadMatrixf() to change the modelview matrix just prior to rendering your shape.

If the child shape is static relative to the parent, you should set the child's vertex positions relative to the parent as you suggested. In this case, you don't even have to call glLoadMatrixf() for the child. If everything in your tree were static relative to each other, you would only need one glLoadMatrixf() call at the root.

If a child shape is moving relative to the parent, you would likely be better off calling glLoadMatrixf() rather than recomputing the vertices of the child.

While I tell you all this, always do performance testing. That is the only way you can be sure which way is better. There are so many CPU/GPU trade-offs that what is good in one OpenGL program might be bad in another.

Insight3D blog

RegularKid

139

Author

November 12, 2008 10:16 AM

Great advice! Thanks!

swiftcoder

18,997

November 12, 2008 10:42 AM

Quote:Original post by RegularKid
I'm wondering though if it might be more efficient to get rid of the pushing and popping for the transformation and simply calculate each shape vertex position based on the parents transformation ( and of course it's parents transformation ). So, basically doing the math for each vertex rather than using push / pop and specifying my vertices using local coordinates.

It will never be faster to recalculate vertices on the CPU than the GPU. Remember, the GPU has been specially developed to perform vertex transformation, and it is incredibly fast at it.

Quote:Original post by jsderon
Yes, the push, pop, translate, scale, and rotate commands are performance killers.

Huh? While they may not be the most incredibly thought out functions in the world, and they are soon to be deprecated, they are highly unlikely to be a deciding factor in performance. Push and Pop effectively do a single memcpy each, while translate, rotate and scale each perform a few trig functions and a single matrix multiplication each. None of that is going to be a problem, unless you are calling them many thousands of times per frame.

A quick tip on optimisation: unless you can prove that a given function is a performance bottleneck (i.e. with a profiler), it isn't worth worrying about.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

RegularKid

139

Author

November 12, 2008 10:56 AM

Thanks, swiftcoder...

That's interesting that there are two differing views on the transformation performance. What I think I'm going to do is this:

1. Any shape that is static, pre-calculate it's vert positions at start up and store them all in a large array that I can use for glDrawArrays. That way I'm not re-doing the math ( either myself or through the push / pop logic ) each frame and can just make a single call each frame to render them.

2. Any shape that is dynamic, I'll run some profiles on both methods ( recalculating vert positions myself and using push / pop ) and see which one works better for my particular solutions.

Thanks again for the help, guys!

NoKKiE

152

November 12, 2008 11:05 AM

I agree with swiftcoder, those matrices just represent linear transformations so the math is simply a couple of numbers being multiplied and added together. Rotate would throw in a call to cos and sin, the performance isn't going to be make that much of a difference. Especially if the positions of these things are moving, I really doubt you can make it any faster without using these functions.

mrbastard

1,577

November 12, 2008 11:06 AM

true, but...

given that updating all the vertices individually on the CPU then allows you to draw the whole lot in one draw call instead of having the GPU wait around betweeen each bit of vertex data arriving, it's possible that it'll be faster, depending on the number of vertices and the frequency with which the transforms you apply to them change. If they don't change every frame, why do the math every frame?

[size="1"]

swiftcoder

18,997

November 12, 2008 11:08 AM

Quote:Original post by RegularKid
1. Any shape that is static, pre-calculate it's vert positions at start up and store them all in a large array that I can use for glDrawArrays. That way I'm not re-doing the math ( either myself or through the push / pop logic ) each frame and can just make a single call each frame to render them.

Fair enough, you are reducing both the number of draw calls and the number of state changes, so you should benefit from this change.

Quote:2. Any shape that is dynamic, I'll run some profiles on both methods ( recalculating vert positions myself and using push / pop ) and see which one works better for my particular solutions.

You can do this, but I can already tell you what the results will be:

Lets say you have a model with 5,000 vertices (not a particularly large model). Now, using your method I must recalculate the position of 5,000 vertices (and their normals, if you need lighting). Otherwise, I have a single call to each of push, translate, rotate, scale and pop. That is 5 operation vs. 5,000 operations - 5 clearly wins. The GPU has to transform the vertices whether or not you transform them (since it still has to do perspective correction, etc.), so in terms of big-Oh notation, my method is O(1), and yours is O(n), where n is the number of vertices (constant time vs linear time).

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

RegularKid

139

Author

November 12, 2008 11:11 AM

Makes sense...I'll stick with the push / pop logic then :)

swiftcoder

18,997

November 12, 2008 11:14 AM

Quote:Original post by mrbastard
given that updating all the vertices individually on the CPU then allows you to draw the whole lot in one draw call instead of having the GPU wait around betweeen each bit of vertex data arriving

If you are worried about performance, then you are already using VBOs (Vertex Buffer Objects), right? And with a VBO, the GPU gets the data all in one chunk.

Quote:If they don't change every frame, why do the math every frame?

The GPU still does the math every frame, regardless. If you pre-transform vertices, then they get multiplied by an identity model matrix, but it still costs the same amount as any other vertex*matrix multiplication (don't forget that the view and perspective transformations must still be applied).

Edit: I realise that you might be suggesting baking the entire scene, and rendering it in a single draw call. Unfortunately, even on a crap intel integrated GPU, I can push 500,000 triangles per-frame at 60fps, and there is no way that a CPU could transform that many, that fast. GPUs were developed for a very good reason.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

glPushMatrix performance

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

glPushMatrix performance

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines