if (!meshVAOInit)
{
glGenVertexArrays( 1, &meshVAO );
glBindVertexArray( meshVAO );
bindBuffers();
bindFormat( shader, format );
meshVAOInit = true;
}
else
glBindVertexArray( meshVAO );
//bindBuffers();
//bindFormat( shader, format );
material->begin();
//Walk material index groups
for (UintSize g=0; g<mesh->groups.size(); ++g)
{
//Render current group
TriMesh::IndexGroup &grp = mesh->groups[ g ];
renderGroup( grp );
}
material->end();
//unbindFormat( shader, format );
//unbindBuffers();
glBindVertexArray( 0 );
VAO slower than not using it
I've installed the new driver from nVidia's site (191.07). My graphic card is 9800GTX. I am using the VAO functions through the GL_ARB_vertex_array_object extension.
In the scene that I'm rendering there is a total of around 160,000 triangles, split into many objects (meshes) resulting in around 800 VBOs. All I did now is wrap the calls to set up buffer bindings and data pointers to cache them into VAOs and the result is a frame rate drop from 80fps to 55fps. I did check and make sure that the VAO setup bit only happens once per each mesh, so from frame one onwards, only the glBindVertexArray() function is being called. Here is a snippet from my code:
Has anyone got experience with using VAOs on similar hardware? After an hour of googling I still haven't found any information on the performance issues with VAOs.
No experience here, but a quick Google brought me to the OpenGL forums where there's a thread with pretty much everyone agreeing that VAO is 100% the same speed-wise as not using it, making it a total waste of time using.
Well I wouldn't care if it was no speed up, but what seems strange to me is it actually causes a performance regression, that's what bugs me. How on earth can calling one function instead of 10 per each of the 800 meshes take more time? I am even getting some uniform locations by string name in the bindFormat function and that's about the slowest thing you would want to do each frame.
If things are going a reasonable way, then you're certainly right. One function call can't be slower than a dozen of them, and the driver must be able to cache its state at least as well as you can.
My only guess would be that maybe some of your uniforms are exactly 0.5 or 1.0 or 2.0 by chance? And if that's the case, then maybe the driver tries to be "extra smart" by recompiling shaders for each set, optimizing out those special constants.
Some old, broken nVidia drivers were smart like this whenever you changed any uniform, which really sucked if you didn't know. Maybe a similar behaviour is built into VAO again, who except the driver writers could tell...?
My only guess would be that maybe some of your uniforms are exactly 0.5 or 1.0 or 2.0 by chance? And if that's the case, then maybe the driver tries to be "extra smart" by recompiling shaders for each set, optimizing out those special constants.
Some old, broken nVidia drivers were smart like this whenever you changed any uniform, which really sucked if you didn't know. Maybe a similar behaviour is built into VAO again, who except the driver writers could tell...?
So what you are saying is that the driver might be recompiling my shaders on-the-fly depending on the values of the uniform variables that I pass in per-frame? I don't see any sense in implementing such an "optimization" in a driver as the process of re-compiling each frame can never be faster than the drawback of not using a variable as a constant. Or can it?
Quote:Original post by ilebenI'm not saying that this is what is happening for you, and to my knowledge, recent drivers should not do that any more.
So what you are saying is that the driver might be recompiling my shaders on-the-fly depending on the values of the uniform variables that I pass in per-frame?
I'm just trying to give a guess on what might give a performance degradation that doesn't make any sense and that actually cannot be. Obviously, this is just a guess, there's no way I could really know, you'd have to ask a nVidia driver developer.
However, recompiling shaders is certainly something that some old broken nVidia drivers used to do (if, and only if, you supplied some special values like 0.5 or 1.0). This was extremely annoying because first you didn't know about it, and then you'd eventually end up having your shaders recompiled several dozen times per frame, and there was no obvious reason why for fark's sake your frame rates sucked at one time, and then again everything worked just fine, when you didn't change anything that matters (or so you thought!). The workaround was simply to change 0.5 to 0.50001 or 0.49999, but hey, you had know that in the first place!
It is even legal for the driver to do that kind of thing (although as you said it is very disputable whether it makes any sense). The driver is only required to keep everything in a way so it isn't externally visible (so, the application won't crash).
Yep, I get all that. I was just pointing out such an "optimization" seems extremely unlikely to have a case where it would be welcome at all.
Anyway, I was playing around with it a bit more, even changed all my shaders to accept vertex, normal and texture coordinates through generic vertex attributes (glVertexAttribPointer) rather than builtin gl*Pointer() functions. I thought it might be that this new feature somehow doesn't support the old fixed pipeline well. What I found out was - nothing. It still almost halves my framerate.
Anyway, I was playing around with it a bit more, even changed all my shaders to accept vertex, normal and texture coordinates through generic vertex attributes (glVertexAttribPointer) rather than builtin gl*Pointer() functions. I thought it might be that this new feature somehow doesn't support the old fixed pipeline well. What I found out was - nothing. It still almost halves my framerate.
Isn't 800 VBOs a bit high? Have you tried transforming everything on the CPU and using a single VBO?
800 sounds normal, imho. In my experience, in a scene of 350+350 vbos (z-pass), VAOs were 5-10% slower on c2d E8500 OC@3.8GHz, DDR3@1.6GHz + GF8600GT/GTX275 . On the 3.2 beta drivers. (can't have many cache-misses on this PC)
Try with multithreading driver-optimizations disabled and enabled.
I seriously doubt shader recompilation has anything to do with this. It's probably just that nV haven't optimized VAOs yet - they had a bug in getting them to work, so for now probably it's a slow-but-working version of their code that they're using. Quite possibly it'll get optimized soonish.
Try with multithreading driver-optimizations disabled and enabled.
I seriously doubt shader recompilation has anything to do with this. It's probably just that nV haven't optimized VAOs yet - they had a bug in getting them to work, so for now probably it's a slow-but-working version of their code that they're using. Quite possibly it'll get optimized soonish.
Quote:Original post by raigan
Isn't 800 VBOs a bit high? Have you tried transforming everything on the CPU and using a single VBO?
Well if I had everything in one VBO, then there would be no point in using VAOs anyway, since the purpose was to reduce the overhead of switching between the VBOs (and gl*Pointer()s) as much as possible.
I do wanna keep things in separate VBOs because of the different vertex formats of the meshes. I could merge all the meshes with same format into several VBOs (and in fact I've tried that already), but I still wanted to see if VAOs could get the non-merged version performance closer to the merged one. It's just a matter of trying to make the more comfy-to-use scenario faster.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement