Instancing vs. static VB?

Started by
15 comments, last by gjaegy 18 years, 6 months ago
For a project I need to render *tons* of simple and fairly static geometry. Each item is a plain box/cube, 12 triangles. If possible, I would like to render say 200k or event 1M of these. Right now, the only thing that differs per instance is its position relative to the others (say in a uniform grid or something). However, I do not need/want to render all boxes always. On command (relatively seldom) I would like to chose if each box should be visible/rendered or not. This does not need to be lightning fast, as it is not done per frame. Needless to say I cannot have 100k-1M DP calls, so I guess I am left to instancing or having a static vertex buffer. Most examples of instancing deal with more complex geometry (say 1k polys) and fewer instances (0.5-1k). Are there any significant per-instance overheads in the Instancing API that would cause problems for my case? Given the requirements below, in order of importance, which method would be best? * As fast rendering as possible * As little memory overhead as possible, both vidmem and main mem. * Updating which items should be rendered is possible, but is OK with slow.
Advertisement
Quote:As fast rendering as possible

A static VBO will give you this, although you might want to split up the geometry in a small number of VBOs if not all of the boxes will be visible at all times.

Quote:As little memory overhead as possible, both vidmem and main mem.

I havent used DX instancing unfortunatly, only read about it, but I think it would save you a lot of memory. Would probably render a fair bit slower than the VBOs though.

Quote:Updating which items should be rendered is possible, but is OK with slow.

This wouldnt be too tricky with VBOs. It will take a little time to do if they are static, but as you said that was ok.

You might want to consider portability too, I thought instancing only worked on nvidias 6x00+ cards.
OK, thanks for your input!

You hinted at splitting up into multiple VBs, for flexibility.
This would be kind of nice, in that if all boxes to be housed in the same VB were de-selected, the buffer wouldn't need to be created at all...

Do you have any ballpark estimate of how many DP calls say a P4 2.7 GHz and 6800GT can cope with, without choking?
To provide a feel for what granularity this partitioning could use...
Quote:Original post by freka586
Do you have any ballpark estimate of how many DP calls say a P4 2.7 GHz and 6800GT can cope with, without choking?
To provide a feel for what granularity this partitioning could use...


hmm, rumor has it at <5000 draw calls per frame (at 30fps) for modern computers, and thats with the CPU doing nothing but processing draw calls. I'd say its very application dependent so you will have to experiment and benchmark. Just set it up so you can change a constant for the partitioning and recompile/test until you hit a sweet spot.
Quote:hmm, rumor has it at <5000 draw calls per frame (at 30fps) for modern computers

perhaps knock it up with opengl, from memory u can better those figures by an order of magnitude
Yeah, I was going to suggest that but skipped it. DX people tend to get a bit touchy about the issue. ;P
i was interested so i threw together a quick test
8000 cubes at 130fps with immediate mode ie glBegin().. glEnd()
im sure u can achieve a lot better
Unfortunately OpenGL is not an option, I must use Direct X.
The most appealing option right now is to have a number of static VBs, say around 500 or so, each containing a large number of boxes.
Should provide a reasonable balance between batch size and number of draw calls.

Thanks for all input!

I think the draw call number is actually more like 1000, but it depends on the app.

Instancing is supported on ATI HW too, (there's an 'enable instancing' checkbox in the catalyst control center now), but IIRC there are a number of hacks that you have to go thorugh in your app to get it to work. The only reason it isn't fully supported is that MS, for no good reason, decided that you should be required to support VS 3.0 in order to expose instancing support
There are a couple things you might consider:

1. Use an index buffer to choose which cubes to draw. That way the vertex buffer can remain static.

2. If the cubes are all the same and are regularly spaced, then you can render in large batches reusing a single vertex buffer and just changing the world transform (and using the index buffer to choose which to render).
John BoltonLocomotive Games (THQ)Current Project: Destroy All Humans (Wii). IN STORES NOW!

This topic is closed to new replies.

Advertisement