Is computing translation matrices directly in the shader more efficient then on the CPU when instancing

Graphics and GPU Programming Programming

Started by pseudomarvin September 30, 2015 10:20 AM

3 comments, last by pseudomarvin 8 years, 6 months ago

380

Author

September 30, 2015 10:20 AM

I am rendering quad particles that turn around to stare at the camera any way you look. Until now I used a system in which a rotation matrix was the same for all the 1000 particles. The translation matrix to move each particle to its place was computed on the CPU.

In HLSL:


cbuffer PerInstanceBuffer: register(b10)
{
    matrix translationMatrix[1000];
}

struct VertexShaderInput
{
    float3 pos : POSITION;
    uint instanceID : SV_INSTANCEID;
};

main (VertexShaderInput input)
{
     float4 pos = mul(model, input.pos); if (instancingOn) { pos = mul(translationMatrix[input.instanceID], pos); }
      ...
}

Is it a better idea to just send a 1000 position offsets and have a function in the shader which computes (or rather just defines since no computation is required in this case) the translation matrix and then use it because the pipelined nature of the GPU will make it faster and doing it on the CPU?

Like this:


cbuffer PerInstanceBuffer: register(b10)
{
     matrix instancePosition[1000];
}

struct VertexShaderInput
{
     float3 pos : POSITION;
     uint instanceID : SV_INSTANCEID;
};

matrix GetTranslationMatrix(float3 instancePosition)
{
   ... //calculate the matrix
   return translationMatrix;
}

main (VertexShaderInput input)
{
    float4 pos = mul(model, input.pos);
    if (instancingOn)
    {
        pos = mul(GetTranslationMatrix(instancePosition[input.instanceID], pos);
    }
...
}

Thanks.

Matias Goldberg

9,637

September 30, 2015 01:46 PM

If you only use translation (i.e. no orientation nor scale) then you don't need a matrix at all.

Just do:


//Offset then apply main model matrix.
float4 pos = mul(model, float4( input.pos.xyz + instancePosition[input.instanceID].xyz, 1.0f );
 
//...or apply main model matrix, then offset.
float4 pos = mul(model, input.pos.xyz );
pos.xyz += instancePosition[input.instanceID].xyz;

Twitter: @matiasgoldberg

Distant Souls ? Alliance AirWar ? My Free Royalty-Free Music Library

Krohm

5,051

September 30, 2015 02:00 PM

The longer your shaders, the better. Push as much as you can into it. The more particles you have the more you'll gain, you have hundreds of ALUs waiting for commands, graphics shaders are very often too short for nowaday's architecture.

Keep your CPU free for stuff you cannot trivially GPU-ize!

Previously "Krohm"

Krypt0n

4,769

September 30, 2015 02:52 PM

in addition to what Matias Goldberg said, if your particles are always facing the camera, you can use the vertexID to figure out which of the 4 corners your vertex is and offset it.


float4 pos = mul(model, float4( instancePosition[input.instanceID].xyz, 1.0f );
pos.x += VertexID&1?-particleScale.X:particleScale.X;
pos.y += VertexID&2?-particleScale.Y:particleScale.Y;

that way your instance buffer is actually the particle buffer and the vertex data is generated by the vertexID, simple and cheap.

video game porting and optimization service + consulting

pseudomarvin

380

Author

September 30, 2015 04:40 PM

Great, thank you for all the advice.