Jump to content
  • Advertisement
Sign in to follow this  
WhiteChocolateMocha

OpenGL Valve OpenGL Tips and Tricks

This topic is 2073 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Advertisement

Is the "Vertex Attribute Object" they mention as being slow in this video the same thing as Vertex Arrays (eg glGenVertexArrays)

and if so, what kinds of things would make them slower than rebinding the buffers/calling glVertexAttribPointer every draw?

Share this post


Link to post
Share on other sites

Is the "Vertex Attribute Object" they mention as being slow in this video the same thing as Vertex Arrays (eg glGenVertexArrays)

and if so, what kinds of things would make them slower than rebinding the buffers/calling glVertexAttribPointer every draw?

 

Yes, that's VAOs.

 

Really, all Valve have given us is their conclusions; we don't have their code, we don't have their test cases, we don't have their profiling data, and we don't know to what extent this was based on hardware vendor advice (it's interesting to note here that id Software don't use VAOs either - and in their case they have released their code so we can cross-check and confirm).

 

If you think about it, changing a VAO involves swapping out one huge chunk of state and swapping in another similarly huge chunk - the enabled arrays, their pointers and the buffers used for them.  It's easy enough to concieve of scenarios where not using VAOs may be more efficient - maybe you just wanted to change one pointer but keep everything else the same, or maybe you wanted to change the buffers (and remember that they're not using GL4.3 so no vertex attrib binding) but keep everything else, or maybe your GPU vendor just has a bad implementation of VAOs in their driver?  Again, without the missing information from Valve it's hard to draw conclusions - we don't really know what kind of vertex formats they're using, how often they're changing them, and whether their usage patterns are consistent and sensible, or borderline insane.

 

Without all of that the only conclusion we can validly draw is that Valve have found one case where using VAOs is slower (and they're not giving us the full information we need to test and/or support their findings), but that doesn't necessarily hold true for all cases.  Profile your own code, form your own conclusions that are appropriate for your own program, and use whichever approach gives you the best performance for your own use cases.

Share this post


Link to post
Share on other sites


it's interesting to note here that id Software don't use VAOs either

And neither does tri-Ace, and that’s because…

 


the only conclusion we can validly draw is that Valve have found one case where using VAOs is slower

…I’m responsible for performance inside of tri-Ace and I have yet to find a case at all in which a VAO is faster than manual switching, when manual switching is done properly.

 

It’s not like they found a few cases in which the performance was better for lack of VAO’s, but rather that their searches, id Software’s searches, and my own searches yielded no results in the pursuit of a better-case scenario for VAO’s.  VAO’s simply do not offer better performance than you can get on your own via your own redundant state-tracking, and as Valve mentioned that is likely never to change, since it requires a scope outside of the driver’s range.

 

I’m covering this in my upcoming book.

 

 

L. Spiro

Share this post


Link to post
Share on other sites
In theory something like a VAO SHOULD be faster because the driver can cache and validate upfront the various buffers bound and convert it to a sane format.

In reality not all your streams/buffers are going to be static AND bind-to-edit allowances mean they probably don't cache to such/any degree. (bind vao, bind new buffer, opps edited vao...).

The 4.4 extension 'GL_ARB_multi_bind' will probably end up being the fastest way of doing it as you can use one API call to set multiple streams at once which, assuming the api is sane, should let you set 'static' stream data and then bind in 'instance' data as needed afterwards.

As for everything else from Valve on this; given they are pushing a Linux/OpenGL based OS I'd take what they say with varying degrees of salt.

Share this post


Link to post
Share on other sites

Just to be clear on this, I've also benchmarked VAOs as being slower, even in the case where you create and bind a single VAO during startup then write the rest of your code as if VAOs didn't exist.  I stopped short of saying "VAOs are always slower" because I can guarantee that somebody, somewhere, right now undoubtedly has a case where they are actually faster.

 

I haven't benchmarked VAOs combined with GL4.3 vertex attrib binding but I suspect that this may be a faster path than pre-GL4.3 usage because it can involve just swapping out buffer specifications, leaving the rest of the vertex format intact.  Valve of course aren't using that because they must target pre-GL4.3 hardware.

 

Despite all of this I still use VAOs because I find them convenient for state management, the performance impact is not too high, and there are bigger bottlenecks in GL anyway (updating dynamic buffer objects, for example, although I'm hoping that GL4.4 buffer storage will resolve much of that).

Share this post


Link to post
Share on other sites

I'm surprised VAO should be slower because not only is it fewer API calls, and not only can the driver cache and validate updront the various buffers, but it can also cache the validation. This is admittedly less expensive than in the case of a FBO (which is why it's faster to switch 2 FBOs than to add/remove attachments to a single one), but still it necessarily means touching fewer objects spread out in memory, and thus fewer cache misses.

 

That said, it surprises me they're discouraging MapBuffer, too. In my experience, MapBufferRange is just about the same as CopyBufferSubData, with the difference that you can offload the copy to another thread. And if the GPU sync really bites you as they suggest, there's still MAP_UNSYNCHRONIZED_BIT which you can use as described by Hrabcak and Masserann in Cozzi/Riccio's book. That not only avoids synchronization and lets you offload the copy to another thread, but it also avoids having the driver perform memory allocation and reclamation work.

Surely the Valve guys would know about that technique?

Share this post


Link to post
Share on other sites

I'm surprised VAO should be slower because not only is it fewer API calls, and not only can the driver cache and validate updront the various buffers, but it can also cache the validation. This is admittedly less expensive than in the case of a FBO (which is why it's faster to switch 2 FBOs than to add/remove attachments to a single one), but still it necessarily means touching fewer objects spread out in memory, and thus fewer cache misses.

 

Depends on Valve's usage, to be honest.  E.g. a common enough scenario is to use the same vertex format and layout but to change the buffers; without GL_ARB_vertex_attrib_binding it's not possible to do this without respecifying the entire VAO, so there's not only no caching going on in this scenario, but also the extra overhead of VAO respecification and revalidation (at which point in time you may as well not be using VAOs at all).

 

I highly doubt that Valve are using GL_ARB_vertex_attrib_binding as many AMD cards, and all Intel cards, don't support it, and Valve's products must run on that hardware.

 

I'd also draw your attention to their earlier observation (in the same presentation) about GL being chatty but efficient, and not to judge a piece of code by number of calls.  It's easy enough to concieve of a single API call that does a lot more work than multiple calls, so it really depends on the amount of work that each API call has to do.  If - as I suspect - most vendors implement VAOs primarily as a user-mode software wrapper, with lazy state changes calling into kernel mode to flush changed VAO states to the hardware when a draw call is made, the API overhead of single call versus multiple calls should really be very minimal.

 

That said, it surprises me they're discouraging MapBuffer, too. In my experience, MapBufferRange is just about the same as CopyBufferSubData, with the difference that you can offload the copy to another thread. And if the GPU sync really bites you as they suggest, there's still MAP_UNSYNCHRONIZED_BIT which you can use as described by Hrabcak and Masserann in Cozzi/Riccio's book. That not only avoids synchronization and lets you offload the copy to another thread, but it also avoids having the driver perform memory allocation and reclamation work.

Surely the Valve guys would know about that technique?

 

Valve definitely know about this technique because it's the way D3D buffer updates work, so they've been using it in D3D for over 10 years now; it's very straightforward to port D3D discard/no-overwrite code to MapBufferRange (the API calls used match up very well) so they must have another reason for not using MapBufferRange.  Again, I'd suggest that this reason is because GL_ARB_map_buffer_range may not be available on all of their target hardware.  Raw MapBuffer (i.e. without "Range" ) has several problems so BufferSubData is definitely to be preferred over that in cases where MapBufferRange isn't available.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!