Sweet Spot for # of vertices in Vertex Buffer vs. # of instances

Started by
4 comments, last by stauffec 11 years, 4 months ago
I have been playing around with instancing for objects like grass, and I was struck with what I think is an interesting question. Lets say that an instanced billboard of grass is made up of a textured quad. Let's say there is a need for 10,000 of such quads. What is the sweet spot in terms of # of instances vs. complexity of vertex buffer? Should I have a single quad in the vertex buffer, and then have 10,000 instances? Should I have 2 quads and have 5,000? I'm going to run some tests to find out that sweet spot - but I was wondering if anyone was aware of concrete method of reasoning as opposed to just trial and error?
Advertisement
The hardware knows nothing about instances. It's all about batch length, how many primitives get transformed in a single DIP call.
In my experience drawing less than 1000 unique vertices: you're trashing GPU performance, even on ultra-low-end due to CPU being too busy dispatching.
That said, if all your models are static and/or guaranteed to be relatively coherent, pre-transform them and slap them in a single draw call (and therefore buffer), as long as you meet the required memory budget.

For your specific example however, instancing is absolutely necessary, and I would not really worry about it.

Previously "Krohm"

What data is required to correctly place/orient each instance? Is that data larger or equal to the number of vertices that have been saved by increasing the instance count?
Ahh... didn't find the paper, but there was once a presentation or paper (nvidia?) on the performance of tri processing vs draw calls. The GPU was able to process several hundred tris between two draw calls, therefore the rule of thumb was, that it does not speed up the performance when drawing a single object with less the XYZ tris. For current hardware the number could be around 1k (guessing) per single object (batch).

But this does not consider the arrangement of objects in batches or instancing, at least it should be clear that a single render call for 2 tris is sub-optimal.
I may be wrong here, but in that type of scenario you should just profile your scene with a few different configurations and see what works best for you. It is very likely that there is no good answer for this question, since you may be limited elsewhere in the pipeline (my guess would be fillrate, but its just a guess). It sounds like a broken record, but it really is good advice to just try out several different configurations and see which one performs the "most good".

Depending on what else you are rendering in the scene, it may even be that the optimal mix of vertices vs. instances could change depending on where you are within the scene... For example, if you are bound elsewhere in the pipeline, then you should go with Hodgeman's tip - save memory where you can. However, if you are bound at vertex assembly (which is very unlikely!) then you would have to profile and see what combo works best for you.

I hope that helps!
Thanks for the feedback. Hodgman, basically for the grass, I'm applying a simple wave function to each quad independent of the other. So there isn't much additional information needed.

This topic is closed to new replies.

Advertisement