Managing instancing

Started by
4 comments, last by kalle_h 9 years, 11 months ago

Hi,

Currently my engine only supports instancing in a few limited cases, and I'm trying to fix that by implementing a more generic system.

When do engines usually find objects that can be instanced? Dynamically every frame after culling? Or at a "higher-level" by keeping a list of objects that use the same model?

Currently my scene struct looks like this:


struct Scene
{
    uint             num_actors;
    InstanceData*    actors_instance_data;
    uint*            actors_node_handles; //handles of actors' scene graph node(used to get world matrix)
    Model**          actors_models;
    BoundingSphere** bounding_spheres;
};

I could do it every frame after culling, by sorting the actors by model and if two actors a and b use the same model (actors_models[a] == actors_models[b]) then they can be instanced by copying actors_instance_data[a] and actors_instance_data[b] to a constant buffer.

Does this seem reasonable or should I go with a "higher level" option?

Thanks.

Advertisement
You will definitely want to have a more complex system as a foundation than what you have shown. A model is a collection of meshes, and those meshes are the atomic unit of draw calls in a model. Instancing can only take place on atomic units of draw calls.

My approach is to have the system do it automatically.
Every atomic draw call gets sorted by the render queue. When objects have the same materials/shaders/textures they implicitly end up next to each other in the render queue (this reduces state changes).
When you want to support instancing, you simply add a vertex buffer ID to the sort key. Objects will already be next to each other if they have the same material/shader/textures/etc., so if they also have the same vertex buffer then they are candidates for instancing.

The instance buffer is very small and easy to create on-the-fly, so a double-buffered or triple-buffered pool (iOS needs triple-buffering) of vertex buffers is maintained. It is generally faster to avoid updating these so a CPU-side key can be used to find an instance buffer that has already been made for the meshes preparing for instancing and take advantage of per-frame temporal coherence.
Again, this is a double-buffered pool of vertex buffers, meaning frame 1 may scoop 3 or 4 vertex buffers from pool #0, frame 2 scoops up likely the same number from pool #1, frame 3 back to pool #0, etc. Not just double-buffered vertex buffers, but a whole pool of them.
Each pool should only allocate as many as requested in a frame but not release their memory every frame.


The system catches all potential cases where instancing can be performed with minimal overhead.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

I was going to do what LS is suggesting -- sort my draws (and filter redundant states) then merge consecutive draws and dynamically build the instance buffer.

But I ended up doing what you're describing in the OP - after culling, sort model instances by model asset (and material), then submit the submeshes of each asset to the render queue once (for each material).
Each high level drawing system (such as this ModelRenderer) has to come up with its own logic to make use of it.

I may still try out the other version eventually, but I went with this way to begin with so I could make no assumptions at the lowest level about what kinds of per-instance data the user might want - such as from vertex buffer streams, cbuffers with VertexId, both at once, etc... The lowest level of my renderer is supposed to be a general purpose GPU API, like D3D/GL, without much magic.

Battlefield 3 way is to use instancing for everything. Just pack every draw call related data to single tbuffer. Then each instanced draw call needs just start index and instance size. Then shader can manually fetch up needed data and everything is simple and clean.

We do everything at level build time based on the assets used in that level. The artists pick from different mesh assets from a database, and place instances of them in the level. Then when we build the level we figure out which mesh assets are used, and where the instances are located. Then we build dedicated data structures for each mesh that are optimized for submitted a list of currently visible instances at runtime.

Forget to link for BF3 presentation http://www.slideshare.net/DICEStudio/directx-11-rendering-in-battlefield-3

This topic is closed to new replies.

Advertisement