• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
metsfan

VBO Pooling: Does it make sense?

9 posts in this topic

Hello all,

In the project I am currently working on, the objects on screen are very transitory. There are never more than a few dozen objects on screen at a time, but the objects change all the time, and every object has slightly different geometry. Obviously I could just create a new VBO and delete the old ones every time new objects come in and old objects are removed, but does it make more sense to create a VBO Pool, whereby I allocate a preset number of VBOs, and any time one is needed, I pull an available VBO from the pool, and when an object is freed, its VBO is returned to the pool. Additionally the pool would automatically resize if it got too big or ran out of objects.

Does this sort of thing sound like a good idea, or a waste of time?

Thanks.

-Adam
0

Share this post


Link to post
Share on other sites
It doesn’t make a lot of sense since you can’t control what OpenGL is doing inside the driver. If your goal is to avoid run-time allocations of VBO’s, that means you allocate them all up-front.
If you end up not using them all, you have allocated unnecessarily which in itself could be a burden on the driver. You never know.
It’s fine enough to just allocate them when necessary and only when necessary.



L. Spiro
2

Share this post


Link to post
Share on other sites
I wouldn't create and destroy VBOs at runtime - object creation and deletion is generally a quite expensive process.

For your case I'd look to see how much of your data can be kept absolutely static. OK, you've got a number of objects with different geometry, but you'll probably find that there are multiple instances of the same object type being used, just with a different transform, so that's a good candidate for making static. If other per-object properties differ you can pull them out as shader uniforms and just keep the common stuff in static VBOs; doing a little bit of extra shader work can frequently be substantially cheaper than having to constantly update VBO data.

If that doesn't apply then I'd go for a streaming buffer pattern - have a read of [url="http://www.opengl.org/wiki/Buffer_Object_Streaming"]this page[/url] for further info on implementing that.
2

Share this post


Link to post
Share on other sites
[quote name='mhagain' timestamp='1352428483' post='4999112']
I wouldn't create and destroy VBOs at runtime - object creation and deletion is generally a quite expensive process.[/quote]
So, I'm not clear that this is entirely true for OpenGL resource handles.

glGenBuffers() really doesn't do that much work - most of the cost is when you define the buffer with glBufferData(). In fact, before the call to glBufferData, OpenGL knows neither the size nor the type of memory to allocate, so very little actual work can be done during glGenBuffers(). Similarly, calling glDeleteBuffers() is hardly more expensive than the tear-down that happens when you re-load the buffer via glBufferData().

You are of course correct that it is key to reduce the number of buffer [b]updates[/b], but creation and destruction shouldn't be a performance issue. Edited by swiftcoder
1

Share this post


Link to post
Share on other sites
Buffer objects are created when first bound (see the [url="http://www.opengl.org/sdk/docs/man3/xhtml/glGenBuffers.xml"]reference page[/url], also this is among the tips in the "OpenGL Insights" book). The memory for backing the buffer object is allocated when you call [font=courier new,courier,monospace]glBufferData [/font]et al.

Thus, if you want to do this optimization, you should bind each buffer at least once, too. Still, this doesn't reserve memory (but that is something you probably can't, and shouldn't do upfront in any case).
2

Share this post


Link to post
Share on other sites
Some ideas, most depending on the fact that there is at least some algorithmic relation between the objects[list]
[*]Using indexed drawing. Maybe you can have a "big" predefined VBO, and simply need to update the indices.
[*]Using glBufferSubData(). Maybe some of the vertex data is the same, and some is not.
[*]Is it a setup similar to animation? Animation can be greatly improved by using bones. Vertices are then attached to bones, and you only have to move a few bones. The shader will then compute the new vertices from the bones data, which is a much smaller set. The technique can be extended to arbitrary complex transformations, if the number of vertices is fairly constant but have a mathematical relation to a smaller subset of data.
[*]Use a geometry shader to create the vertices you need.
[/list]
0

Share this post


Link to post
Share on other sites
[quote name='larspensjo' timestamp='1352586911' post='4999740']
Some ideas, most depending on the fact that there is at least some algorithmic relation between the objects[list]
[*]Using indexed drawing. Maybe you can have a "big" predefined VBO, and simply need to update the indices.
[*]Using glBufferSubData(). Maybe some of the vertex data is the same, and some is not.
[*]Is it a setup similar to animation? Animation can be greatly improved by using bones. Vertices are then attached to bones, and you only have to move a few bones. The shader will then compute the new vertices from the bones data, which is a much smaller set. The technique can be extended to arbitrary complex transformations, if the number of vertices is fairly constant but have a mathematical relation to a smaller subset of data.
[*]Use a geometry shader to create the vertices you need.
[/list]
[/quote]

Not everything is entirely accurate.[list]
[*]Indexed drawing is very useful, but you should be careful not to abuse it. In order for a pipeline stage to run efficiently most data should be cacheable. Abusing indexed drawing may actually backfire by causing a lot of cache misses in the vertex shader (obviously this will happen only for relatively large vertex buffers).
[*]glBufferSubData is a double edged sword. It's really nice to be able to update parts of the buffer with one API call, but in many cases it actually ruins performance, because many synchronizations have to occur (basically you'll be stalling the hardware). The right way to do partial buffer updates is more complicated, and it involves using glMapBufferRange with GL_MAP_UNSYNCHRONIZED_BIT, while manually managing fences and of course doing it on two threads. Basically, avoid partial buffer updates whenever you can, and when you can't then try to split your buffers so you can. And if that fails then I don't envy you.
[*]Bones are great. Use them whenever you can. They have their limitations, and some coding is required to do them right, but supporting them is practically a must.
[*]If performance is your concern, avoid GS like the plague. There are some cases in which you won't loose performance by using GS. However, there are no cases in which it'll give you better performance than using the alternatives. This is mostly a hardware limitation, but (unfortunately) it's here to stay.
One good thing to note about GS is that it makes your code neater, shorter and cleaner. So if you can't really be bothered by some annoying performance issues, GS is awesome.
[/list] Edited by max343
2

Share this post


Link to post
Share on other sites
max343 is more or less correct on everything, but the point about not abusing indexes needs elaboration.

That shouldn't be read as an all-out proscription on using indexed drawing; rather it's a caution to make sure that your verts and indexes are properly ordered so as to make the most optimal use of your hardware's vertex caches (in fact, indexed drawing is a requirement for your hardware's vertex cache to even activate, so if you're not using indexes you by definition do not have a vertex cache).

So if your indexes are randomly jumping around in your vertex buffer then there is higher likelihood of an upcoming vertex not already being in the cache; also higher likelihood of vertexes that would otherwise be reused from the cache failing to be so on account of them being replaced in the cache sooner than they should be.

It's also important to make sure that the index sizes and values you use are actually supported in hardware. Thankfully 32-bit support is now essentially ubiquitous (with the possible exception of some mobile devices) but one often sees GL tutorial code using GL_UNSIGNED_BYTE.... which brings me to the next point...

It's tempting to see indexed drawing as being all about memory saving because that's something that's directly measurable by you in your own code, but it's really only a small part of the story. Getting more efficient vertex cache usage is where the real performance benefit lies, as well as the ability to stitch together multiple disjoint primitives (and mix fans and strips) without needing to invoke the Cthulhu that is degenerate triangles.
0

Share this post


Link to post
Share on other sites
[quote name='mhagain' timestamp='1352594741' post='4999777']
...That shouldn't be read as an all-out proscription on using indexed drawing; rather it's a caution to make sure that your verts and indexes are properly ordered so as to make the most optimal use of your hardware's vertex caches (in fact, indexed drawing is a requirement for your hardware's vertex cache to even activate, so if you're not using indexes you by definition do not have a vertex cache)...
[/quote]
Can anyone provide a useful link or results of any experiment that could confirm the story about vertex post-transform cache on post-Fermi cards?
I have read a lot about that (and implemented some schemes), and indeed there are benefits if applied on old cards, but I had no improvements on Fermi.
Also, it significantly depends upon the driver, and the way vertices are distributed between multiple processing units. We should delve deeper into GPUs architecture and drivers' design to get correct answer. It is much simpler to carry out some experiments. That's why I ask for your results. is there any benefits of optimizing indexing on modern GPUs?
0

Share this post


Link to post
Share on other sites
[quote name='Aks9' timestamp='1352641662' post='4999898']
Can anyone provide a useful link or results of any experiment that could confirm the story about vertex post-transform cache on post-Fermi cards?
[/quote]

I'd be interested in seeing that too, but until then it's reasonable to assume that one doesn't want to restrict one's target hardware to post-Fermi only.
0

Share this post


Link to post
Share on other sites
[quote name='Aks9' timestamp='1352641662' post='4999898']
Can anyone provide a useful link or results of any experiment that could confirm the story about vertex post-transform cache on post-Fermi cards?
I have read a lot about that (and implemented some schemes), and indeed there are benefits if applied on old cards, but I had no improvements on Fermi.
Also, it significantly depends upon the driver, and the way vertices are distributed between multiple processing units. We should delve deeper into GPUs architecture and drivers' design to get correct answer. It is much simpler to carry out some experiments. That's why I ask for your results. is there any benefits of optimizing indexing on modern GPUs?
[/quote]

The most major difference in the memory department in Fermi was that NVIDIA introduced L1/L2. About 700kb of L2, and 16kb of L1 (in default mode). This means that now you know how to use the cache better, or how to fool it (if you're really into that). Before Fermi, caching in NVIDIA's hardware was basically a black-box and involved a lot of finger crossing.
Since the introduction of normal cache architecture, the notion of VRAM transfer rate is not so interesting. Now there are L2 misses instead. A rule of thumb is that if you have a miss in some cache level you'll waste roughly 10 times more cycles by trying to fetch the same data from an upper level. And you always start from L1 where fetches are very cheap.

With larger buffers all you'll probably see is capacity misses, and they'll generally apply only for L1. These are not so bad, and this is what was checked up until now, except there was no L1/L2 on pre-Fermi hardware so essentially you'd get something like an L2 miss.
On the other hand, triggering L2 misses won't go unnoticed. These are not so hard to trigger, you just need to keep in mind that cache line's size is 128b (so there are about 6k lines), now assume some associativity/eviction policy, and finally you can use any of the widely known ways to create conflict misses for this configuration. Here random jumps in the buffer won't give you the desired result, as they'll just uniformly map themselves on the L2, but using some simple pattern should do the trick quite well.
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0