Jump to content
  • Advertisement
Sign in to follow this  
ileben

VAO slower than not using it

This topic is 3168 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I've installed the new driver from nVidia's site (191.07). My graphic card is 9800GTX. I am using the VAO functions through the GL_ARB_vertex_array_object extension. In the scene that I'm rendering there is a total of around 160,000 triangles, split into many objects (meshes) resulting in around 800 VBOs. All I did now is wrap the calls to set up buffer bindings and data pointers to cache them into VAOs and the result is a frame rate drop from 80fps to 55fps. I did check and make sure that the VAO setup bit only happens once per each mesh, so from frame one onwards, only the glBindVertexArray() function is being called. Here is a snippet from my code:
    if (!meshVAOInit)
    {
      glGenVertexArrays( 1, &meshVAO );
      glBindVertexArray( meshVAO );
      bindBuffers();
      bindFormat( shader, format );
      meshVAOInit = true;
    }
    else
      glBindVertexArray( meshVAO );

    //bindBuffers();
    //bindFormat( shader, format );
    material->begin();

    //Walk material index groups
    for (UintSize g=0; g<mesh->groups.size(); ++g)
    {
      //Render current group
      TriMesh::IndexGroup &grp = mesh->groups[ g ];
      renderGroup( grp );
    }

    material->end();
    //unbindFormat( shader, format );
    //unbindBuffers();
    glBindVertexArray( 0 );

Has anyone got experience with using VAOs on similar hardware? After an hour of googling I still haven't found any information on the performance issues with VAOs.

Share this post


Link to post
Share on other sites
Advertisement
No experience here, but a quick Google brought me to the OpenGL forums where there's a thread with pretty much everyone agreeing that VAO is 100% the same speed-wise as not using it, making it a total waste of time using.

Share this post


Link to post
Share on other sites
Well I wouldn't care if it was no speed up, but what seems strange to me is it actually causes a performance regression, that's what bugs me. How on earth can calling one function instead of 10 per each of the 800 meshes take more time? I am even getting some uniform locations by string name in the bindFormat function and that's about the slowest thing you would want to do each frame.

Share this post


Link to post
Share on other sites
If things are going a reasonable way, then you're certainly right. One function call can't be slower than a dozen of them, and the driver must be able to cache its state at least as well as you can.

My only guess would be that maybe some of your uniforms are exactly 0.5 or 1.0 or 2.0 by chance? And if that's the case, then maybe the driver tries to be "extra smart" by recompiling shaders for each set, optimizing out those special constants.

Some old, broken nVidia drivers were smart like this whenever you changed any uniform, which really sucked if you didn't know. Maybe a similar behaviour is built into VAO again, who except the driver writers could tell...?

Share this post


Link to post
Share on other sites
So what you are saying is that the driver might be recompiling my shaders on-the-fly depending on the values of the uniform variables that I pass in per-frame? I don't see any sense in implementing such an "optimization" in a driver as the process of re-compiling each frame can never be faster than the drawback of not using a variable as a constant. Or can it?

Share this post


Link to post
Share on other sites
Quote:
Original post by ileben
So what you are saying is that the driver might be recompiling my shaders on-the-fly depending on the values of the uniform variables that I pass in per-frame?
I'm not saying that this is what is happening for you, and to my knowledge, recent drivers should not do that any more.
I'm just trying to give a guess on what might give a performance degradation that doesn't make any sense and that actually cannot be. Obviously, this is just a guess, there's no way I could really know, you'd have to ask a nVidia driver developer.

However, recompiling shaders is certainly something that some old broken nVidia drivers used to do (if, and only if, you supplied some special values like 0.5 or 1.0). This was extremely annoying because first you didn't know about it, and then you'd eventually end up having your shaders recompiled several dozen times per frame, and there was no obvious reason why for fark's sake your frame rates sucked at one time, and then again everything worked just fine, when you didn't change anything that matters (or so you thought!). The workaround was simply to change 0.5 to 0.50001 or 0.49999, but hey, you had know that in the first place!

It is even legal for the driver to do that kind of thing (although as you said it is very disputable whether it makes any sense). The driver is only required to keep everything in a way so it isn't externally visible (so, the application won't crash).

Share this post


Link to post
Share on other sites
Yep, I get all that. I was just pointing out such an "optimization" seems extremely unlikely to have a case where it would be welcome at all.

Anyway, I was playing around with it a bit more, even changed all my shaders to accept vertex, normal and texture coordinates through generic vertex attributes (glVertexAttribPointer) rather than builtin gl*Pointer() functions. I thought it might be that this new feature somehow doesn't support the old fixed pipeline well. What I found out was - nothing. It still almost halves my framerate.

Share this post


Link to post
Share on other sites
Isn't 800 VBOs a bit high? Have you tried transforming everything on the CPU and using a single VBO?

Share this post


Link to post
Share on other sites
800 sounds normal, imho. In my experience, in a scene of 350+350 vbos (z-pass), VAOs were 5-10% slower on c2d E8500 OC@3.8GHz, DDR3@1.6GHz + GF8600GT/GTX275 . On the 3.2 beta drivers. (can't have many cache-misses on this PC)
Try with multithreading driver-optimizations disabled and enabled.
I seriously doubt shader recompilation has anything to do with this. It's probably just that nV haven't optimized VAOs yet - they had a bug in getting them to work, so for now probably it's a slow-but-working version of their code that they're using. Quite possibly it'll get optimized soonish.

Share this post


Link to post
Share on other sites
Quote:
Original post by raigan
Isn't 800 VBOs a bit high? Have you tried transforming everything on the CPU and using a single VBO?


Well if I had everything in one VBO, then there would be no point in using VAOs anyway, since the purpose was to reduce the overhead of switching between the VBOs (and gl*Pointer()s) as much as possible.

I do wanna keep things in separate VBOs because of the different vertex formats of the meshes. I could merge all the meshes with same format into several VBOs (and in fact I've tried that already), but I still wanted to see if VAOs could get the non-merged version performance closer to the merged one. It's just a matter of trying to make the more comfy-to-use scenario faster.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!