VBO Pooling: Does it make sense?

Started by
8 comments, last by max343 11 years, 5 months ago
Hello all,

In the project I am currently working on, the objects on screen are very transitory. There are never more than a few dozen objects on screen at a time, but the objects change all the time, and every object has slightly different geometry. Obviously I could just create a new VBO and delete the old ones every time new objects come in and old objects are removed, but does it make more sense to create a VBO Pool, whereby I allocate a preset number of VBOs, and any time one is needed, I pull an available VBO from the pool, and when an object is freed, its VBO is returned to the pool. Additionally the pool would automatically resize if it got too big or ran out of objects.

Does this sort of thing sound like a good idea, or a waste of time?

Thanks.

-Adam
Advertisement
It doesn’t make a lot of sense since you can’t control what OpenGL is doing inside the driver. If your goal is to avoid run-time allocations of VBO’s, that means you allocate them all up-front.
If you end up not using them all, you have allocated unnecessarily which in itself could be a burden on the driver. You never know.
It’s fine enough to just allocate them when necessary and only when necessary.



L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

I wouldn't create and destroy VBOs at runtime - object creation and deletion is generally a quite expensive process.

For your case I'd look to see how much of your data can be kept absolutely static. OK, you've got a number of objects with different geometry, but you'll probably find that there are multiple instances of the same object type being used, just with a different transform, so that's a good candidate for making static. If other per-object properties differ you can pull them out as shader uniforms and just keep the common stuff in static VBOs; doing a little bit of extra shader work can frequently be substantially cheaper than having to constantly update VBO data.

If that doesn't apply then I'd go for a streaming buffer pattern - have a read of this page for further info on implementing that.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.


I wouldn't create and destroy VBOs at runtime - object creation and deletion is generally a quite expensive process.

So, I'm not clear that this is entirely true for OpenGL resource handles.

glGenBuffers() really doesn't do that much work - most of the cost is when you define the buffer with glBufferData(). In fact, before the call to glBufferData, OpenGL knows neither the size nor the type of memory to allocate, so very little actual work can be done during glGenBuffers(). Similarly, calling glDeleteBuffers() is hardly more expensive than the tear-down that happens when you re-load the buffer via glBufferData().

You are of course correct that it is key to reduce the number of buffer updates, but creation and destruction shouldn't be a performance issue.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Buffer objects are created when first bound (see the reference page, also this is among the tips in the "OpenGL Insights" book). The memory for backing the buffer object is allocated when you call [font=courier new,courier,monospace]glBufferData [/font]et al.

Thus, if you want to do this optimization, you should bind each buffer at least once, too. Still, this doesn't reserve memory (but that is something you probably can't, and shouldn't do upfront in any case).
Some ideas, most depending on the fact that there is at least some algorithmic relation between the objects

  • Using indexed drawing. Maybe you can have a "big" predefined VBO, and simply need to update the indices.
  • Using glBufferSubData(). Maybe some of the vertex data is the same, and some is not.
  • Is it a setup similar to animation? Animation can be greatly improved by using bones. Vertices are then attached to bones, and you only have to move a few bones. The shader will then compute the new vertices from the bones data, which is a much smaller set. The technique can be extended to arbitrary complex transformations, if the number of vertices is fairly constant but have a mathematical relation to a smaller subset of data.
  • Use a geometry shader to create the vertices you need.
[size=2]Current project: Ephenation.
[size=2]Sharing OpenGL experiences: http://ephenationopengl.blogspot.com/

Some ideas, most depending on the fact that there is at least some algorithmic relation between the objects

  • Using indexed drawing. Maybe you can have a "big" predefined VBO, and simply need to update the indices.
  • Using glBufferSubData(). Maybe some of the vertex data is the same, and some is not.
  • Is it a setup similar to animation? Animation can be greatly improved by using bones. Vertices are then attached to bones, and you only have to move a few bones. The shader will then compute the new vertices from the bones data, which is a much smaller set. The technique can be extended to arbitrary complex transformations, if the number of vertices is fairly constant but have a mathematical relation to a smaller subset of data.
  • Use a geometry shader to create the vertices you need.



Not everything is entirely accurate.

  • Indexed drawing is very useful, but you should be careful not to abuse it. In order for a pipeline stage to run efficiently most data should be cacheable. Abusing indexed drawing may actually backfire by causing a lot of cache misses in the vertex shader (obviously this will happen only for relatively large vertex buffers).
  • glBufferSubData is a double edged sword. It's really nice to be able to update parts of the buffer with one API call, but in many cases it actually ruins performance, because many synchronizations have to occur (basically you'll be stalling the hardware). The right way to do partial buffer updates is more complicated, and it involves using glMapBufferRange with GL_MAP_UNSYNCHRONIZED_BIT, while manually managing fences and of course doing it on two threads. Basically, avoid partial buffer updates whenever you can, and when you can't then try to split your buffers so you can. And if that fails then I don't envy you.
  • Bones are great. Use them whenever you can. They have their limitations, and some coding is required to do them right, but supporting them is practically a must.
  • If performance is your concern, avoid GS like the plague. There are some cases in which you won't loose performance by using GS. However, there are no cases in which it'll give you better performance than using the alternatives. This is mostly a hardware limitation, but (unfortunately) it's here to stay.
    One good thing to note about GS is that it makes your code neater, shorter and cleaner. So if you can't really be bothered by some annoying performance issues, GS is awesome.
max343 is more or less correct on everything, but the point about not abusing indexes needs elaboration.

That shouldn't be read as an all-out proscription on using indexed drawing; rather it's a caution to make sure that your verts and indexes are properly ordered so as to make the most optimal use of your hardware's vertex caches (in fact, indexed drawing is a requirement for your hardware's vertex cache to even activate, so if you're not using indexes you by definition do not have a vertex cache).

So if your indexes are randomly jumping around in your vertex buffer then there is higher likelihood of an upcoming vertex not already being in the cache; also higher likelihood of vertexes that would otherwise be reused from the cache failing to be so on account of them being replaced in the cache sooner than they should be.

It's also important to make sure that the index sizes and values you use are actually supported in hardware. Thankfully 32-bit support is now essentially ubiquitous (with the possible exception of some mobile devices) but one often sees GL tutorial code using GL_UNSIGNED_BYTE.... which brings me to the next point...

It's tempting to see indexed drawing as being all about memory saving because that's something that's directly measurable by you in your own code, but it's really only a small part of the story. Getting more efficient vertex cache usage is where the real performance benefit lies, as well as the ability to stitch together multiple disjoint primitives (and mix fans and strips) without needing to invoke the Cthulhu that is degenerate triangles.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.


...That shouldn't be read as an all-out proscription on using indexed drawing; rather it's a caution to make sure that your verts and indexes are properly ordered so as to make the most optimal use of your hardware's vertex caches (in fact, indexed drawing is a requirement for your hardware's vertex cache to even activate, so if you're not using indexes you by definition do not have a vertex cache)...

Can anyone provide a useful link or results of any experiment that could confirm the story about vertex post-transform cache on post-Fermi cards?
I have read a lot about that (and implemented some schemes), and indeed there are benefits if applied on old cards, but I had no improvements on Fermi.
Also, it significantly depends upon the driver, and the way vertices are distributed between multiple processing units. We should delve deeper into GPUs architecture and drivers' design to get correct answer. It is much simpler to carry out some experiments. That's why I ask for your results. is there any benefits of optimizing indexing on modern GPUs?

Can anyone provide a useful link or results of any experiment that could confirm the story about vertex post-transform cache on post-Fermi cards?


I'd be interested in seeing that too, but until then it's reasonable to assume that one doesn't want to restrict one's target hardware to post-Fermi only.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

This topic is closed to new replies.

Advertisement