Sign in to follow this  
pseudomarvin

Is computing translation matrices directly in the shader more efficient then on the CPU when instancing

Recommended Posts

pseudomarvin    379

I am rendering quad particles that turn around to stare at the camera any way you look. Until now I used a system in which a rotation matrix was the same for all the 1000 particles. The translation matrix to move each particle to its place was computed on the CPU.

 

In HLSL:

cbuffer PerInstanceBuffer: register(b10)
{
    matrix translationMatrix[1000];
}

struct VertexShaderInput
{
    float3 pos : POSITION;
    uint instanceID : SV_INSTANCEID;
};

main (VertexShaderInput input)
{
     float4 pos = mul(model, input.pos); if (instancingOn) { pos = mul(translationMatrix[input.instanceID], pos); }
      ...
}

 

Is it a better idea to just send a 1000 position offsets and have a function in the shader which computes (or rather just defines since no computation is required in this case) the translation matrix and then use it because the pipelined nature of the GPU will make it faster and doing it on the CPU?

 

Like this:

cbuffer PerInstanceBuffer: register(b10)
{
     matrix instancePosition[1000];
}

struct VertexShaderInput
{
     float3 pos : POSITION;
     uint instanceID : SV_INSTANCEID;
};

matrix GetTranslationMatrix(float3 instancePosition)
{
   ... //calculate the matrix
   return translationMatrix;
}

main (VertexShaderInput input)
{
    float4 pos = mul(model, input.pos);
    if (instancingOn)
    {
        pos = mul(GetTranslationMatrix(instancePosition[input.instanceID], pos);
    }
...
}

Thanks.

Edited by pseudomarvin

Share this post


Link to post
Share on other sites
Matias Goldberg    9580

If you only use translation (i.e. no orientation nor scale) then you don't need a matrix at all.

 

Just do:

//Offset then apply main model matrix.
float4 pos = mul(model, float4( input.pos.xyz + instancePosition[input.instanceID].xyz, 1.0f );
 
//...or apply main model matrix, then offset.
float4 pos = mul(model, input.pos.xyz );
pos.xyz += instancePosition[input.instanceID].xyz;
Edited by Matias Goldberg

Share this post


Link to post
Share on other sites
Krohm    5031

The longer your shaders, the better. Push as much as you can into it. The more particles you have the more you'll gain, you have hundreds of ALUs waiting for commands, graphics shaders are very often too short for nowaday's architecture.

Keep your CPU free for stuff you cannot trivially GPU-ize!

Share this post


Link to post
Share on other sites
Krypt0n    4721
in addition to what Matias Goldberg said, if your particles are always facing the camera, you can use the vertexID to figure out which of the 4 corners your vertex is and offset it.

float4 pos = mul(model, float4( instancePosition[input.instanceID].xyz, 1.0f );
pos.x += VertexID&1?-particleScale.X:particleScale.X;
pos.y += VertexID&2?-particleScale.Y:particleScale.Y;
that way your instance buffer is actually the particle buffer and the vertex data is generated by the vertexID, simple and cheap. Edited by Krypt0n

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this