• 11
• 9
• 10
• 9
• 10

# Is computing translation matrices directly in the shader more efficient then on the CPU when instancing

This topic is 935 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I am rendering quad particles that turn around to stare at the camera any way you look. Until now I used a system in which a rotation matrix was the same for all the 1000 particles. The translation matrix to move each particle to its place was computed on the CPU.

In HLSL:

cbuffer PerInstanceBuffer: register(b10)
{
matrix translationMatrix[1000];
}

{
float3 pos : POSITION;
uint instanceID : SV_INSTANCEID;
};

{
float4 pos = mul(model, input.pos); if (instancingOn) { pos = mul(translationMatrix[input.instanceID], pos); }
...
}



Is it a better idea to just send a 1000 position offsets and have a function in the shader which computes (or rather just defines since no computation is required in this case) the translation matrix and then use it because the pipelined nature of the GPU will make it faster and doing it on the CPU?

Like this:

cbuffer PerInstanceBuffer: register(b10)
{
matrix instancePosition[1000];
}

{
float3 pos : POSITION;
uint instanceID : SV_INSTANCEID;
};

matrix GetTranslationMatrix(float3 instancePosition)
{
... //calculate the matrix
return translationMatrix;
}

{
float4 pos = mul(model, input.pos);
if (instancingOn)
{
pos = mul(GetTranslationMatrix(instancePosition[input.instanceID], pos);
}
...
}


Thanks.

Edited by pseudomarvin

##### Share on other sites

If you only use translation (i.e. no orientation nor scale) then you don't need a matrix at all.

Just do:

//Offset then apply main model matrix.
float4 pos = mul(model, float4( input.pos.xyz + instancePosition[input.instanceID].xyz, 1.0f );

//...or apply main model matrix, then offset.
float4 pos = mul(model, input.pos.xyz );
pos.xyz += instancePosition[input.instanceID].xyz;
Edited by Matias Goldberg

##### Share on other sites

The longer your shaders, the better. Push as much as you can into it. The more particles you have the more you'll gain, you have hundreds of ALUs waiting for commands, graphics shaders are very often too short for nowaday's architecture.

Keep your CPU free for stuff you cannot trivially GPU-ize!

##### Share on other sites
in addition to what Matias Goldberg said, if your particles are always facing the camera, you can use the vertexID to figure out which of the 4 corners your vertex is and offset it.

float4 pos = mul(model, float4( instancePosition[input.instanceID].xyz, 1.0f );
pos.x += VertexID&1?-particleScale.X:particleScale.X;
pos.y += VertexID&2?-particleScale.Y:particleScale.Y;

that way your instance buffer is actually the particle buffer and the vertex data is generated by the vertexID, simple and cheap. Edited by Krypt0n