So I just got this question in my mind, sponza model of crysis has about 350 sub-models i.e subsets and if we do not merge the same type of mesh and texture,wouldn't it result into massive draw call? But when we merge subsets with same type of materials, draw calls/subsets reduces to 23 from 350. But now, we have problem of frustum culling. How would you do frustum culling with this?
How would you do frustum culling on sponza?
350 total renderables is doable even in mobile. So you could just brute force it. In actual case where optimizations matter you can do heuristic for submesh merge where only merge close ones.
In our current project I don't merge antything but just frustum cull everything(what is actually loaded in memory), sort them and draw everything with instancing. This give me accurate culling result and draw calls are still manageable.
But now, we have problem of frustum culling. How would you do frustum culling with this?
Why is this a problem unique to sub-meshes? They have bounding boxes the same as regular meshes do.
The only difference at all is that if the main mesh is entirely inside the frustum, none of the sub-meshes are checked (all are added to the render queue).
L. Spiro
You could split the model up intro groups of polygons. First separate the central floor into one group, and then divide the rest into radial slices, like cutting a pizza.
Leave the model as "one draw", but sort all these separate groups into different contiguous regions of it's index buffer.
E.g. Say there's 7 groups, the index buffer contains the indices to draw group 1, followed by the indices to draw group 2...and so on up to group 7. Sort these groups according to their angular/slice location, so if two slices are next to each other around the "pizza", they're also next to each other in the index buffer.
Now that you've pre-prepared your model in this way:
If poly-group #1,2,3,4 are visible, you issue a single draw with the indices from group1.begin to group4.end.
If poly-group 1,2,6,7 are visible, then you issue two draws - one with indices from group1.begin to group2.end, and one with indices from group6.begin to group7.end.
P.s. I totally stole this idea from DigitalFragment.
Why is this a problem unique to sub-meshes? They have bounding boxes the same as regular meshes do.But now, we have problem of frustum culling. How would you do frustum culling with this?
The only difference at all is that if the main mesh is entirely inside the frustum, none of the sub-meshes are checked (all are added to the render queue).
L. Spiro
Suppose there are 20 sub-meshes (all use same material and texture) out of which, 10 are outside frustum. Without merging this same type
sub-meshes, we would end up with 10 extra draw calls (20 if all are visible) while when we have merged them, we would not be able to do frustum culling on submeshes. This is the situation where i asked question what should i do. (The mesh is a level containing many objects i.e submeshes.)
Why is this a problem unique to sub-meshes? They have bounding boxes the same as regular meshes do.But now, we have problem of frustum culling. How would you do frustum culling with this?
The only difference at all is that if the main mesh is entirely inside the frustum, none of the sub-meshes are checked (all are added to the render queue).
L. Spiro
Suppose there are 20 sub-meshes (all use same material and texture) out of which, 10 are outside frustum. Without merging this same type
sub-meshes, we would end up with 10 extra draw calls (20 if all are visible) while when we have merged them, we would not be able to do frustum culling on submeshes. This is the situation where i asked question what should i do. (The mesh is a level containing many objects i.e submeshes.)
I think the "cutting edge" solution to this problem would be to not merge, cull as usual, and use something like GLs MultiDraw* to submit many draw calls for distinct objects.
I'm not even sure why this needs a solution. 350 draw calls isn't huge on modern hardware; sure it's not going be as fast as 23, but if it gives you single-digit framerates it's going to be because you're doing something else wrong. The bad old days of D3D9 on XPDM are behind us.
By way of experimentation, I just unbatched some old D3D9 code so that it went from 116 draw calls to 1141, and got very similar framerates on WDDM.
None of this is to say that batching is bad, and I'm certainly not claiming that you shouldn't batch. But batching is just an optimization like any other, so treat it the same way as you would all other optimizations: don't do it pre-emptively, profile your program, determine if you need it, and be careful that the extra work you're doing to build batches doesn't wipe out the perf gain you get from using them.