Sign in to follow this  
stauffec

Sweet Spot for # of vertices in Vertex Buffer vs. # of instances

Recommended Posts

I have been playing around with instancing for objects like grass, and I was struck with what I think is an interesting question. Lets say that an instanced billboard of grass is made up of a textured quad. Let's say there is a need for 10,000 of such quads. What is the sweet spot in terms of # of instances vs. complexity of vertex buffer? Should I have a single quad in the vertex buffer, and then have 10,000 instances? Should I have 2 quads and have 5,000? I'm going to run some tests to find out that sweet spot - but I was wondering if anyone was aware of concrete method of reasoning as opposed to just trial and error?

Share this post


Link to post
Share on other sites
The hardware knows nothing about instances. It's all about batch length, how many primitives get transformed in a single DIP call.
In my experience drawing less than 1000 unique vertices: you're trashing GPU performance, even on ultra-low-end due to CPU being too busy dispatching.
That said, if all your models are static and/or guaranteed to be relatively coherent, pre-transform them and slap them in a single draw call (and therefore buffer), as long as you meet the required memory budget.

For your specific example however, instancing is absolutely necessary, and I would not really worry about it.

Share this post


Link to post
Share on other sites
Ahh... didn't find the paper, but there was once a presentation or paper (nvidia?) on the performance of tri processing vs draw calls. The GPU was able to process several hundred tris between two draw calls, therefore the rule of thumb was, that it does not speed up the performance when drawing a single object with less the XYZ tris. For current hardware the number could be around 1k (guessing) per single object (batch).

But this does not consider the arrangement of objects in batches or instancing, at least it should be clear that a single render call for 2 tris is sub-optimal.

Share this post


Link to post
Share on other sites
I may be wrong here, but in that type of scenario you should just profile your scene with a few different configurations and see what works best for you. It is very likely that there is no good answer for this question, since you may be limited elsewhere in the pipeline (my guess would be fillrate, but its just a guess). It sounds like a broken record, but it really is good advice to just try out several different configurations and see which one performs the "most good".

Depending on what else you are rendering in the scene, it may even be that the optimal mix of vertices vs. instances could change depending on where you are within the scene... For example, if you are bound elsewhere in the pipeline, then you should go with Hodgeman's tip - save memory where you can. However, if you are bound at vertex assembly (which is very unlikely!) then you would have to profile and see what combo works best for you.

I hope that helps!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this