Advertisement Jump to content
Sign in to follow this  

Instancing Performance

This topic is 1886 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

With d3d11 instancing, you stream the instance data via a secondary buffer into the vertex declaration.  This can make the vertex structure look pretty big byte wise (for example, four extra float4s just from the world matrix).  I've read that you want to keep the byte size of vertex structures as low as possible to reduce memory bandwidth. 


Is using a large vertex structure from instancing equivalent to using the same large vertex structure without instancing memory bandwidth wise?  Or are the GPUs efficient at assembly the vertices from the non-instanced data and the instanced data so I do not need to worry about this?  I'm assuming it is efficient since instancing is a recommended optimization, but I wanted to check.


The reason I'm a bit concerned is that I want to add generic instancing support to my engine but I'm wondering if there is too much overhead if the instance count is small. 

Share this post

Link to post
Share on other sites

You can try to use Large constat buffers something like


#define MAX_INSTANCE_CNT ????


cbuffer PerInstanceData

 matrix world[MAX_INSTANCE_CNT];

 float4 somethingDifferent[MAX_INSTANCE_CNT];


and later use the SV_InstanceID;

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement

Important Information

By using, you agree to our community Guidelines, Terms of Use, and Privacy Policy. is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!