Compute shader for skinned mesh animation - D3D11

Published May 25, 2015
Advertisement
Preface

I was raised and educated as an engineer. Not an engineer for a locomotive (though I would've loved to have been such in the days of steam), but, by education and experience, an engineer in the fields of nuclear, controls and instrumentation, and a brief career at Oak Ridge National Lab in the Computer and Instrumentation section. My father was an engineer, as were my 2 older brothers and my older son - altogether representing careers in ceramics, petroleum, agricultural and industrial engineering, and the fields I played around with mentioned above.

The point I'm trying to make is that my engineering inclination is apparently genetic, and I'm the result of nature and nurture. By occupation, an engineer is an "idea implementer." Not to say I don't have an original idea once in a while, but I do enjoy taking someone else's ideas and seeing if I can implement them.

In my continuing self-education in, and exploration of, D3D11, I'm using a mesh editor as the vehicle. Several previous entries in this blog describe approaches to implementation of pieces for that editor. The ideas for the important pieces are, alas, not the result of my own ideas, but are suggestions by others that I've engineered.

Yet Another Acknowledgement

At a hint from gamedev member unbird that mesh skinning could be done in a compute shader, I decided to give compute shaders a try, as that was an area of the API I hadn't experimented with. As with the PPP (Pixel Perfect Picking) approach that proved so useful in my editor, credit for the idea of using a compute shader for mesh skinning goes to that li'l ol' shader wizard - unbird.

Why Use A Compute Shader

A common approach to animated skinned-mesh rendering is as shown in the shader code below. The process is comprised of streaming vertex data to a vertex shader, applying animation, world, view and projection transforms to the position, transforming the vertex normal for vertex-lighting calculations, and passing the results to a pixel shader for texture sampling and color output.

For an indexed mesh, in which the same vertex is used in multiple triangles, that process results in the same calculations being performed on the same vertex multiple times. E.g., if the mesh contains N vertices which are used in just 2 adjacent triangles, the transformation and lighting calculations are done N "extra" times. It's not uncommon for some meshes to use the same vertex in 4 or more triangles. If the vertex calculation results are cached, avoiding redundant calculations for the same vertex, efficiency can be improved.

Further, N.B., that skinning process uses the same data (animation matrix array, world-view-projection matrices, lighting info, etc.) and the same instructions for every vertex calculation. That situation is right down a compute shader's alley. I.e., multiple parallel threads ( that sounds oxymoronic**) can be used to perform the calculations once per vertex, caching the results. That combines the efficiency of one-calc-per-vertex with parallel processing.

** The phrase "multiple parallel threads" was provided by the Department of Redundancy Department.

Skinned Mesh Animation - A Compute Shader Approach

Briefly, I modified my mesh skinning shader into a compute shader by substituting an RWBuffer containing the mesh's skinned vertices, for the "normal" indexed input vertex stream, and another RWBuffer for output in place of output to a pixel shader. Rendering is then done by copying the compute shader output buffer to a vertex buffer, and using DrawIndexed on that vertex buffer to a pass-through vertex shader (output = input), and a simple texture sampling pixel shader.

The implementation described below is very basic, with lots of room for improvement.

Using very simple profile testing (using QueryPerformanceCounter before and after the render calls, and repeatedly averaging 1000 render times), it appears the compute shader approach may be ~30% faster than the "traditional" mesh skinning shader. I have not done any further testing to determine exactly where the efficiency comes from. I'm just reporting the results I obtained.

More Information

Here's the general form of a mesh skinning shader I've used.

// There are various and sundrie constant buffers providing view and projection matrices,// material and lighting params, etc....// so, any variables appearing below that aren't local to the vertex shader are// in some constant buffer somewhere// And, of course, the "traditional" array of animation matrices used to skinned the mesh verticescbuffer cbSkinned : register(b1){ matrix meshWorld; matrix meshWorldInvTranspose; matrix finalMats[96];};Texture2D txDiffuse : register(t0);SamplerState samLinear : register(s0);//--------------------------------------------------------------------------------------struct VS_INPUT{ float4 Pos : POSITION; float4 Norm: NORMAL; float2 Tex : TEXCOORD; uint4 index : BONE; float4 weight : WEIGHT;};struct PS_INPUT{ float4 Pos : SV_POSITION; float4 Color: COLOR0; float2 Tex : TEXCOORD0;};//--------------------------------------------------------------------------------------// Vertex Shader//--------------------------------------------------------------------------------------PS_INPUT VS(VS_INPUT input){ PS_INPUT output = (PS_INPUT)0; int bidx[4] = { input.index.x, input.index.y, input.index.z, input.index.w }; float weight[4] = { input.weight.x, input.weight.y, input.weight.z, 0 }; weight[3] = 1.0f - weight[0] - weight[1] - weight[2]; float4 Pos = 0.0f; float3 Normal = 0.0f; for (int b = 0; b < 4; b++) { Pos += mul(input.Pos, finalMats[bidx]) * weight; Normal += mul(input.Norm.xyz, (float3x3)finalMats[bidx]) * weight; } Normal = normalize(Normal); // legacy code - meshWorld should be combined with ViewProjection on the CPU side output.Pos = mul(Pos, meshWorld); output.Pos = mul(output.Pos, ViewProjection); output.Tex = input.Tex; float4 invNorm = float4(mul(Normal.xyz, (float3x3)meshWorldInvTranspose), 0.0f); output.Color = saturate(dot(invNorm, normalize(lightDir)) + faceColor); return output;}//--------------------------------------------------------------------------------------// Pixel Shader//--------------------------------------------------------------------------------------float4 PS(PS_INPUT input) : SV_Target{ return txDiffuse.Sample(samLinear, input.Tex) * input.Color;}

The vertex buffer streamed into the vertex shader during the context->DrawIndexed call is comprised, not too surprisingly, of vertices matching struct VS_INPUT above. The process is, I believe, pretty standard.

With unbird's hint that mesh skinning could be done in a compute shader, I modified the vertex shader above as follows.

// constant buffers similar to the above code// for the compute shader input and output ...struct CS_INPUT{ float4 Pos; float4 Norm; float2 Tex; uint4 index; float4 weight;};struct CS_OUTPUT{ float4 Pos; float4 Color; float2 Tex;};RWStructuredBuffer vertsIn : register(u0);RWStructuredBuffer vertsOut : register(u1);//--------------------------------------------------------------------------------------// Compute Shader//--------------------------------------------------------------------------------------[numthreads(64, 1, 1)]void CS_Anim(uint3 threadID : SV_DispatchThreadID){ CS_INPUT input = vertsIn[threadID.x]; CS_OUTPUT output = (CS_OUTPUT)0; int bidx[4] = { input.index.x, input.index.y, input.index.z, input.index.w }; // etc., etc. - same as the vertex shader above // but, instead of returning the output to be passed to the pixel shader ... vertsOut[threadID.x] = output;}

For support, several D3D11 buffer objects are created. The vertsIn and vertsOut RWBUFFERs above are USAGE_DEFAULT buffers, bound as UNORDERED_ACCESS, with MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED.

vertsIn is sized as number-vertices * sizeof( struct CS_INPUT ), and is initialized with the mesh vertices. I initialize the data from a std::vector<> of the vertices, but it could be done with CopyResource, also (I think).

vertsOut is sized as number-vertices * sizeof( struct CS_OUPUT).

In addition, a vertex buffer is created for the final rendering, with attributes USAGE_DEFAULT and BIND_VERTEX_BUFFER, and ByteWidth the same as the vertsOut buffer.
Unordered access views ( vInView and vOutView ) are created for each { Format = DXGI_FORMAT_UNKNOWN, ViewDimension = D3D11_UAV_DIMENSION_UNKNOWN, Buffer type }.

The compute shader is used as follows:

1. Set all the constant buffers similar to the skinning vertex shader, except, of course, with context->CSSetConstantBuffers.
2. Call context->CSSetShader( ... ).
3. Set the RWBuffers with:
ID3D11UnorderedAccessView* views[2] = { vInView.Get(), vOutView.Get() };
context->CSSetUnorderedAccessViews(0, 2, views, nullptr);
4. Call context->Dispatch( (numVertices + 63)/64, 1, 1 );

The vertsOutBuffer then contains the screen-space vertices (and tex coords and color) that would normally have been passed on to the pixel shader. The animated skinned mesh is then rendered with a simple pass-through vertex shader, and the same pixel shader originally used with the skinned mesh shader shown first above.

//--------------------------------------------------------------------------------------struct VS_INPUT{ float4 Pos : POSITION0; float4 Color : COLOR0; float2 Tex : TEXCOORD0;};struct PS_INPUT{ float4 Pos : SV_POSITION; float4 Color: COLOR0; float2 Tex : TEXCOORD0;};//--------------------------------------------------------------------------------------// Vertex Shader//--------------------------------------------------------------------------------------PS_INPUT VS(VS_INPUT input){ PS_INPUT output = (PS_INPUT)input; // used to change the semantics return output;}//--------------------------------------------------------------------------------------// Pixel Shader//--------------------------------------------------------------------------------------float4 PS(PS_INPUT input) : SV_Target{ return txDiffuse.Sample(samLinear, input.Tex) * input.Color;}

The drawing is done by:

1. Moving data from the vertsOut buffer to the vertex buffer using CopyResource.
2. Set the appropriate input layout reflecting VS_INPUT, and PRIMITIVE_TOPOLOGY_TRIANGLELIST.
3. Set as input the newly copied vertex buffer with appropriate stride and zero (0) offset, and the mesh's index buffer (with it's original format).
4. Set the vertex and pixel shaders ( created/compiled from the HLSL code shown immediately above ).
5. Set the pixel shader texture and sampler as appropriate.
6. Call context->DrawIndexed( numIndices, 0, 0 );

The compute shader approach to mesh skinning described isn't generally compatible with the PPP (Pixel-Perfect-Picking) approach used for selecting vertices, faces and edges in a mesh editor I've described in other entries in this journal.

However, possible uses for the compute shader approach come to mind.

- increase the efficiency of animated skinned-mesh rendering in general.
- because the compute shader output is in the form of position-color-texcoords structures, that output could be batched with other mesh data (other skinned meshes, static meshes, etc.) for final rendering.
6 likes 0 comments

Comments

Nobody has left a comment. You can be the first!
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Advertisement