Instancing a good idea for speed in this case?
Members - Reputation: 122
Posted 26 April 2011 - 04:23 PM
I've run this through OpenGL Profiler and indeed, the calls to glDrawArrays (as I remember) take up the majority of the time.
Each pin is loaded into a VBO and then called. There are 60 x 49 pins in that image, each one has 180 faces (meshlab doesnt tell me exact triangles) but adding it up and it is pretty close to the figure given by gDebugger.
In addition to drawing the colour step, there is also a step for linear depth (in order to setup some SSAO). At the moment, im getting around 12-15fps. I'd like to get it to 30.
I thought about trying for a non linear depth buffer and reading the depth buffer and colour buffer from the FBO in one go to save a pass but commenting out the depth pass for now seemed to make little difference (oddly).
I tried 'pseudo instancing' (i think) by passing in the transformation matrix as a texture to my vertex shader. This didnt give that much in the way of speedup.
As OSX has limited support (annoyingly) the only method I can see to get more speed is to use GL_ARB_instanced_arrays somehow but Im not exactly sure if this will help or improve things. There may be something else I can do to get things a little faster but I'm not sure what. I've gotten the triangles per pin down about as far as I can but I'm not sure what else is best. Any thoughts chaps? Cheers
Members - Reputation: 657
Posted 26 April 2011 - 09:30 PM
Members - Reputation: 952
Posted 27 April 2011 - 05:40 AM
-1 STATIC VBO for 1 pin, draw with different transformation 2920 times
-1 STATIC VBO for 2920 pins, draw once
-1 STATIC VBO for 1 pin, instance stream of 2920 matrix4x3. (alternatively only a position if you know you'll never need to rotate them.)
If each face is 2 triangles it's already > 1M triangles drawn.
Given your geometry layout and point of view, you are hitting a pathological bad case for GPU. (Long thin triangles that contribute to few samples/pixels)
Members - Reputation: 122
Posted 28 April 2011 - 07:07 AM
So far im trying option 1. Its 1 VBO being called X number of times. I should have said that :S
I shall try the other two and see what we get.
I also agree that there is indeed, a bad cae for the GPU here as we are indeed, getting to the point where the triangles arent really adding much to the scene. It may be time to rethink the approach, possibly with some kind of sprite or imposter. I still think though, we should be able to get more output from this I'd have thought.
Well, There is a possibility I could reduce the poly count but it doesnt look great:
This is with 100 faces as oppose to 180. You can begin to see the polygon outlines which is not nice. Also, you can see i've reduced the overall number of pins. This double view runs at 30fps. With the original number of pins, 180 faces vs 100 faces makes almost no difference in speed. They both go at around 10fps. Its almost as if there is a cutoff point, beyond which you get no speed up or change at all.
Moderators - Reputation: 33417
Members - Reputation: 805
Posted 28 April 2011 - 07:59 AM
an open source GLU replacement library. Much more modern than GLU.
float matrix, inverse_matrix;
glhTranslatef2(matrix, 0.0, 0.0, 5.0);
glhScalef2(matrix, 1.0, 1.0, -1.0);
glUniformMatrix4fv(uniformLocation1, 1, FALSE, matrix);
glUniformMatrix4fv(uniformLocation2, 1, FALSE, inverse_matrix);