Instancing Performance

Started by
1 comment, last by kauna 10 years, 4 months ago

With d3d11 instancing, you stream the instance data via a secondary buffer into the vertex declaration. This can make the vertex structure look pretty big byte wise (for example, four extra float4s just from the world matrix). I've read that you want to keep the byte size of vertex structures as low as possible to reduce memory bandwidth.

Is using a large vertex structure from instancing equivalent to using the same large vertex structure without instancing memory bandwidth wise? Or are the GPUs efficient at assembly the vertices from the non-instanced data and the instanced data so I do not need to worry about this? I'm assuming it is efficient since instancing is a recommended optimization, but I wanted to check.

The reason I'm a bit concerned is that I want to add generic instancing support to my engine but I'm wondering if there is too much overhead if the instance count is small.

-----Quat
Advertisement

You can try to use Large constat buffers something like

#define MAX_INSTANCE_CNT ????

cbuffer PerInstanceData

{
matrix world[MAX_INSTANCE_CNT];

float4 somethingDifferent[MAX_INSTANCE_CNT];
};

and later use the SV_InstanceID;

or use a generic buffer object to store instance data.

Cheers!

This topic is closed to new replies.

Advertisement