Passing matrices vs passing floats in instance buffer?

Started by
5 comments, last by CryZe 11 years, 8 months ago
I have noticed a lot of people use something like this:

//just for example:
struct InstanceStruct
{
XMFLOAT3 position;
XMFLOAT3 rotation;
XMFLOAT3 scale;
}

which would equivalent 3 float3's in the shader;


yet others just pass a ready matrix:

struct InstanceStruct
{
XMFLOAT4X4 transform;
};


for just 1 float4x4
I was wondering - what's the point of the second method?Isn't it better to calculate all the simple things on the GPU insteac of making a matrix on the CPU for each instance?Or have I misunderstood something?
Advertisement
GPU would have to multiply those 3 matrices every vertex (or transform vertex 3 times instead of once), that's a waste, considering those matrices do not change for many vertices. Also it takes less memory/bandwidth, but with such difference probably irrelevant.

GPU would have to multiply those 3 matrices every vertex (or transform vertex 3 times instead of once), that's a waste, considering those matrices do not change for many vertices. Also it takes less memory/bandwidth, but with such difference probably irrelevant.


would you happen to know what SemanticName and Format to pass to the D3D11_INPUT_ELEMENT_DESC for a matrix?Or will any texcoordX do?
So there are no SV semantics for a matrix per vertex. Honestly you probably don't need a matrix per vertex unless you're doing skinning, in which case it's better to just pass it in as a matrix array in a constant buffer.
Perception is when one imagination clashes with another
Both ways have their uses. If you can provide some code examples, it is easier to see what's going on in the shader code and may help on explaining the differences.
Sometimes it is enough to pass just 4x3 matrix in order to save bandwidth (skinning for example).

Nowadays, shaders using instancing may read data easily from constant buffers or generic buffer<float4> objects. I find the latter one quite flexible (size way bigger than a constant buffer and each draw call can use variable amount of data). It's usage is described in Frostbyte design docs.

Cheers!

[quote name='Ripiz' timestamp='1345572955' post='4971912']
GPU would have to multiply those 3 matrices every vertex (or transform vertex 3 times instead of once), that's a waste, considering those matrices do not change for many vertices. Also it takes less memory/bandwidth, but with such difference probably irrelevant.


would you happen to know what SemanticName and Format to pass to the D3D11_INPUT_ELEMENT_DESC for a matrix?Or will any texcoordX do?
[/quote]

You have to set it up as 4 adjacent elements using DXGI_FORMAT_R32G32B32A32_FLOAT.

So there are no SV semantics for a matrix per vertex. Honestly you probably don't need a matrix per vertex unless you're doing skinning, in which case it's better to just pass it in as a matrix array in a constant buffer.

You can store per instance data in a second vertex buffer. The Input Assembler combines the per vertex data and the per instance data for each vertex shader call.

BTT: It's better to upload just a single matrix to the GPU, because this would result in just 4 DP4 instructions. While uploading position, rotation and scale would result in way more instructions. Quaternions are probably faster though.

Also, you don't need to use the TEXCOORD# semantics anymore. Since DirectX 10 you can use any semantic name you want. To upload a matrix you simply upload the 4 float4 values with the same semantic name but different indexes, eg. WVP0, WVP1, WVP2, WVP3.

This topic is closed to new replies.

Advertisement