VBO Pooling: Does it make sense?

Adam E · 2012-11-12T00:13:06

Hello all, In the project I am currently working on, the objects on screen are very transitory. There are never more than a few dozen objects on screen at a time, but the objects change all the time, and every object has slightly different geometry. Obviously I could just create a new VBO and delete the old ones every time new objects come in and old objects are removed, but does it make more sense to create a VBO Pool, whereby I allocate a preset number of VBOs, and any time one is needed, I pull an available VBO from the pool, and when an object is freed, its VBO is returned to the pool. Additionally the pool would automatically resize if it got too big or ran out of objects. Does this sort of thing sound like a good idea, or a waste of time? Thanks. -Adam

Graphics and GPU Programming Programming

Started by metsfan November 08, 2012 10:30 PM

8 comments, last by max343 11 years, 5 months ago

max343

346

November 12, 2012 12:13 AM

Can anyone provide a useful link or results of any experiment that could confirm the story about vertex post-transform cache on post-Fermi cards?
I have read a lot about that (and implemented some schemes), and indeed there are benefits if applied on old cards, but I had no improvements on Fermi.
Also, it significantly depends upon the driver, and the way vertices are distributed between multiple processing units. We should delve deeper into GPUs architecture and drivers' design to get correct answer. It is much simpler to carry out some experiments. That's why I ask for your results. is there any benefits of optimizing indexing on modern GPUs?

The most major difference in the memory department in Fermi was that NVIDIA introduced L1/L2. About 700kb of L2, and 16kb of L1 (in default mode). This means that now you know how to use the cache better, or how to fool it (if you're really into that). Before Fermi, caching in NVIDIA's hardware was basically a black-box and involved a lot of finger crossing.
Since the introduction of normal cache architecture, the notion of VRAM transfer rate is not so interesting. Now there are L2 misses instead. A rule of thumb is that if you have a miss in some cache level you'll waste roughly 10 times more cycles by trying to fetch the same data from an upper level. And you always start from L1 where fetches are very cheap.

With larger buffers all you'll probably see is capacity misses, and they'll generally apply only for L1. These are not so bad, and this is what was checked up until now, except there was no L1/L2 on pre-Fermi hardware so essentially you'd get something like an L2 miss.
On the other hand, triggering L2 misses won't go unnoticed. These are not so hard to trigger, you just need to keep in mind that cache line's size is 128b (so there are about 6k lines), now assume some associativity/eviction policy, and finally you can use any of the widely known ways to create conflict misses for this configuration. Here random jumps in the buffer won't give you the desired result, as they'll just uniformly map themselves on the L2, but using some simple pattern should do the trick quite well.

VBO Pooling: Does it make sense?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

VBO Pooling: Does it make sense?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines