Is computing translation matrices directly in the shader more efficient then on the CPU when instancing

Started by
3 comments, last by pseudomarvin 8 years, 6 months ago

I am rendering quad particles that turn around to stare at the camera any way you look. Until now I used a system in which a rotation matrix was the same for all the 1000 particles. The translation matrix to move each particle to its place was computed on the CPU.

In HLSL:


cbuffer PerInstanceBuffer: register(b10)
{
    matrix translationMatrix[1000];
}

struct VertexShaderInput
{
    float3 pos : POSITION;
    uint instanceID : SV_INSTANCEID;
};

main (VertexShaderInput input)
{
     float4 pos = mul(model, input.pos); if (instancingOn) { pos = mul(translationMatrix[input.instanceID], pos); }
      ...
}

Is it a better idea to just send a 1000 position offsets and have a function in the shader which computes (or rather just defines since no computation is required in this case) the translation matrix and then use it because the pipelined nature of the GPU will make it faster and doing it on the CPU?

Like this:


cbuffer PerInstanceBuffer: register(b10)
{
     matrix instancePosition[1000];
}

struct VertexShaderInput
{
     float3 pos : POSITION;
     uint instanceID : SV_INSTANCEID;
};

matrix GetTranslationMatrix(float3 instancePosition)
{
   ... //calculate the matrix
   return translationMatrix;
}

main (VertexShaderInput input)
{
    float4 pos = mul(model, input.pos);
    if (instancingOn)
    {
        pos = mul(GetTranslationMatrix(instancePosition[input.instanceID], pos);
    }
...
}

Thanks.

Advertisement

If you only use translation (i.e. no orientation nor scale) then you don't need a matrix at all.

Just do:


//Offset then apply main model matrix.
float4 pos = mul(model, float4( input.pos.xyz + instancePosition[input.instanceID].xyz, 1.0f );
 
//...or apply main model matrix, then offset.
float4 pos = mul(model, input.pos.xyz );
pos.xyz += instancePosition[input.instanceID].xyz;

The longer your shaders, the better. Push as much as you can into it. The more particles you have the more you'll gain, you have hundreds of ALUs waiting for commands, graphics shaders are very often too short for nowaday's architecture.

Keep your CPU free for stuff you cannot trivially GPU-ize!

Previously "Krohm"

in addition to what Matias Goldberg said, if your particles are always facing the camera, you can use the vertexID to figure out which of the 4 corners your vertex is and offset it.


float4 pos = mul(model, float4( instancePosition[input.instanceID].xyz, 1.0f );
pos.x += VertexID&1?-particleScale.X:particleScale.X;
pos.y += VertexID&2?-particleScale.Y:particleScale.Y;
that way your instance buffer is actually the particle buffer and the vertex data is generated by the vertexID, simple and cheap.

Great, thank you for all the advice.

This topic is closed to new replies.

Advertisement