Jump to content

  • Log In with Google      Sign In   
  • Create Account

Interested in a FREE copy of HTML5 game maker Construct 2?

We'll be giving away three Personal Edition licences in next Tuesday's GDNet Direct email newsletter!

Sign up from the right-hand sidebar on our homepage and read Tuesday's newsletter for details!


We're also offering banner ads on our site from just $5! 1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


Passing matrices vs passing floats in instance buffer?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
6 replies to this topic

#1 mrheisenberg   Members   -  Reputation: 356

Like
0Likes
Like

Posted 21 August 2012 - 12:07 PM

I have noticed a lot of people use something like this:

//just for example:
struct InstanceStruct
{
     XMFLOAT3 position;
     XMFLOAT3 rotation;
     XMFLOAT3 scale;
}

which would equivalent 3 float3's in the shader;

yet others just pass a ready matrix:

struct InstanceStruct
{
     XMFLOAT4X4 transform;
};

for just 1 float4x4
I was wondering - what's the point of the second method?Isn't it better to calculate all the simple things on the GPU insteac of making a matrix on the CPU for each instance?Or have I misunderstood something?

Sponsor:

#2 Ripiz   Members   -  Reputation: 529

Like
0Likes
Like

Posted 21 August 2012 - 12:15 PM

GPU would have to multiply those 3 matrices every vertex (or transform vertex 3 times instead of once), that's a waste, considering those matrices do not change for many vertices. Also it takes less memory/bandwidth, but with such difference probably irrelevant.

#3 mrheisenberg   Members   -  Reputation: 356

Like
0Likes
Like

Posted 21 August 2012 - 12:58 PM

GPU would have to multiply those 3 matrices every vertex (or transform vertex 3 times instead of once), that's a waste, considering those matrices do not change for many vertices. Also it takes less memory/bandwidth, but with such difference probably irrelevant.


would you happen to know what SemanticName and Format to pass to the D3D11_INPUT_ELEMENT_DESC for a matrix?Or will any texcoordX do?

#4 Seabolt   Members   -  Reputation: 633

Like
0Likes
Like

Posted 21 August 2012 - 01:29 PM

So there are no SV semantics for a matrix per vertex. Honestly you probably don't need a matrix per vertex unless you're doing skinning, in which case it's better to just pass it in as a matrix array in a constant buffer.
Perception is when one imagination clashes with another

#5 kauna   Crossbones+   -  Reputation: 2744

Like
0Likes
Like

Posted 21 August 2012 - 03:04 PM

Both ways have their uses. If you can provide some code examples, it is easier to see what's going on in the shader code and may help on explaining the differences.
Sometimes it is enough to pass just 4x3 matrix in order to save bandwidth (skinning for example).

Nowadays, shaders using instancing may read data easily from constant buffers or generic buffer<float4> objects. I find the latter one quite flexible (size way bigger than a constant buffer and each draw call can use variable amount of data). It's usage is described in Frostbyte design docs.

Cheers!

#6 MJP   Moderators   -  Reputation: 11569

Like
0Likes
Like

Posted 21 August 2012 - 06:32 PM


GPU would have to multiply those 3 matrices every vertex (or transform vertex 3 times instead of once), that's a waste, considering those matrices do not change for many vertices. Also it takes less memory/bandwidth, but with such difference probably irrelevant.


would you happen to know what SemanticName and Format to pass to the D3D11_INPUT_ELEMENT_DESC for a matrix?Or will any texcoordX do?


You have to set it up as 4 adjacent elements using DXGI_FORMAT_R32G32B32A32_FLOAT.

#7 CryZe   Members   -  Reputation: 768

Like
1Likes
Like

Posted 22 August 2012 - 01:28 AM

So there are no SV semantics for a matrix per vertex. Honestly you probably don't need a matrix per vertex unless you're doing skinning, in which case it's better to just pass it in as a matrix array in a constant buffer.

You can store per instance data in a second vertex buffer. The Input Assembler combines the per vertex data and the per instance data for each vertex shader call.

BTT: It's better to upload just a single matrix to the GPU, because this would result in just 4 DP4 instructions. While uploading position, rotation and scale would result in way more instructions. Quaternions are probably faster though.

Also, you don't need to use the TEXCOORD# semantics anymore. Since DirectX 10 you can use any semantic name you want. To upload a matrix you simply upload the 4 float4 values with the same semantic name but different indexes, eg. WVP0, WVP1, WVP2, WVP3.

Edited by CryZe, 22 August 2012 - 01:33 AM.





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS