Why is instancing so fast?

Started by
5 comments, last by Pyrogame 14 years, 1 month ago
Instancing seems to speed rendering up quite a bit but I'm not sure why. Basically if I have a model which I render in a shader I set apparopriate matrices for it to be transformed into in th eshader and call draw on it and I do this multiple times for the same model. With instancing I create a large vertex buffer and fill it with the same model multiple times and give each instance of the model an index which is stored in the vertices, then in the shader I look up the index stored and I use that to determine how it should be transformed. The only idffernece I really see is one draw call versus multiple draw calls but ultimately the size of the vertex buffer being submitted is the same (actually it could be slightly larger in an instanced vertex buffer). So why is it so fast? Is it because a draw call is so slow in directx? Why are draw calls slow then?
Advertisement
Quote:Original post by chillypacman
Is it because a draw call is so slow in directx?
Yes
Quote:Why are draw calls slow then?
Quite a few reasons. Lots of validation is done, states are updated, and due to driver architecture decisions there's slow kernel/user model transitions.
A lot of these issues aren't present in GL/DX10/DX11.
So if I were coding in OpenGL, DX10 or DX11 I wouldn't benefit as much from instancing? would it sitll be worth implementing instancing with those APIs?
What about the speed gains that come from only sending the model data to the GPU once per model, rather than once per instance? I would have thought that played a fairly large part in the dramatic speed increases.
Quote:Original post by chillypacman
So if I were coding in OpenGL, DX10 or DX11 I wouldn't benefit as much from instancing? would it sitll be worth implementing instancing with those APIs?
Yeah there's still a benefit, just not as much as in DX9. With GL it's a relatively new feature, so you've got to check if it's supported.
Quote:Original post by _Sauce_
What about the speed gains that come from only sending the model data to the GPU once per model, rather than once per instance? I would have thought that played a fairly large part in the dramatic speed increases.
Usually your vertex data lives in the GPUs memory, so it doesn't have to be sent to the GPU every draw. The data that is sent per-draw is state updates (texture bindings/current shaders/uniforms/etc).
It's same as whenever you have two processors operating asynchronously: the less they talk to each other, the better. As such saying "Draw this 5000 times with this data" is always going to be faster than saying "Draw this 1 time with this data" 5000 times.
There are 2 techniques for instancing:

Yours, non-indexed instancing, is not supported by hardware, it is only supported by software, which results in a smaller benefit. Like the others said, less calls to the hardware means less overhead. Even, if the data amount you transmit remains the same.

The second is called indexed instancing. This is supported directly by the hardare. You have a vertex buffer for your mesh, and also an index buffer. You need both to use the hardware. Then you also have a second vertex buffer (without index buffer for it), which contains the instance data (in every vertex 1x instance) of your mesh. This vertex contains for example the world matrix, which is a 4x float4 struct. This type of instancing is fast, because you only have to transmit the mesh data once, you only have a few calls, and all the work is then done by the hardare.

In MSDN there is a better description for this: http://msdn.microsoft.com/en-us/library/ee418549%28VS.85%29.aspx

This topic is closed to new replies.

Advertisement