Jump to content
  • Advertisement
Sign in to follow this  
KaiserJohan

Skeletal animation shader questions

This topic is 901 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello,
I'm implementing skeletal animation and I've been looking at some examples on the net and compiled a list of questions.
 
(Snippet of typical shader code)
static const uint MAX_NUM_BONES = 64;

cbuffer SkeletalAnimConstants : register(CBUFFER_REGISTER_VERTEX)
{
    float4x4 gWorldViewMatrix;
    float4x4 gBoneMatrices[MAX_NUM_BONES]
};

1. Are constant buffers the ideal way to send bone transforms to the shader?

2. If so whats a reasonable max limit?

 

 

(Snippet of typical shader code)

struct VSIn
{
    float3 mPosition : POSITION;
    float3 mNormal : NORMAL;
    float2 mTexcoord : TEXCOORD;
    float3 mTangent : TANGENT;
    float3 mBitangent : BITANGENT;
    float4 mBoneIndices : BONEINDICES;
    float4 mBoneWeights : BONEWEIGHTS
};

VSOut vs_main(VSIn input)
{
    float4x4 boneTransform = gBoneMatrices[input.mBoneIndices.x] * input.mBoneWeights.x;
    boneTransform += gBoneMatrices[input.mBoneIndices.y] * input.mBoneWeights.y;
    boneTransform += gBoneMatrices[input.mBoneIndices.z] * input.mBoneWeights.z;
    boneTransform += gBoneMatrices[input.mBoneIndices.w] * input.mBoneWeights.w;
    
    // ...
}

The shaders seems to assume the vertex is affected by at most four bone transforms.

 

3. Is four bones per vertex some kind of 3D model convention? It seems very common. Could there be models with vertices with more than four bones affecting it? What is the best approach then?

4. Isn't there alot of unnecessary matrix scalar multiplications and concatenations if some vertices have less than four bones ( = some weights are zero)?

 

Thanks

Share this post


Link to post
Share on other sites
Advertisement

1) That one is probably a little hardware specific. Constant buffers are geared more towards access patterns that involve all the threads accessing the same piece of data on the same instruction. So gBoneMatrices[7] would be fine, but gBoneMatrices[input.mBoneIndices.z] is not, since the compiler(s) can't resolve what the bone index will be at compile time.

 

Strictly speaking I believe 'tbuffers' are designed more for non-coherent random access, however I've never actually seen anyone use one (except in the Skinning10 sample, referenced here). In practice I'd probably opt for a Buffer<float4> or StructuredBuffer<float4x4> and use those instead. Buffer<float4> would let you compress your matrices to a more compact format if you can get away with it, also note you may only need 3 float4s per bone unless you're doing non-standard things. Check the last row and see if it's always the same across all bones.

 

2) With a constant buffer you're constrained to 1024 bones (4096 float4s per CB), but with a Buffer/StructuredBuffer/tbuffer you effectively have no limit. If you opt for any of the last three options you can just create the buffer to the exact size for how many bones you have.

 

3) I think it's just a reasonable trade-off between quality, performance and the fact that a single vertex attribute can only store 4 indices. If you want more you add another attribute and can now index 8, which is probably more than necessary. I expect there's legacy reasons for it, but you should/can support as many as you feel necessary.

 

4) Whether branching over a single matrix multiplication is worth it or not is again hardware and situation specific.These days I can certainly imagine that skipping zero weight bones would be worth it, but the easiest thing to do is try it on 2 (or ideally all 3) vendor's recent hardware and see what results you get with and without the branches.

Edited by Adam Miles

Share this post


Link to post
Share on other sites

Hi this is a great list of questions!

 

In my experience constants buffer are the best way in dx9, 10 and 11 to send the updated bone transforms to the shader for the vertex skinning. A buffer of about 50 or less 4x4 matrices should be able to handle most (> 90%) of the low to mid class game engine skinned actors in one draw call (I can confirm this for Dark Age of Camelot, Unreal Tournament, Mortal Kombat Armageddon, Zelda Twilight Princess and TitanQuest skinned actors).

 

As for the 4 bone weights per vertex limit, this does seem to be an arbitrary legacy based limit from shader model 2 and the fixed function before it when the blend indices were packed into 4 dwords. I think I saw some research paper that concluded for bipedal meshes 4 blend indices is sufficient for a certain fidelity of realistic motion, but I can't find the paper at hand to link to it.

 

As for the last question of whether you should always blend in 4 bone weights, I actually use another number to tell me how many of the 4 blend indices are needed. Here is some example shader model 2 code for skinning:

 

CRenderableMesh 0F7B78E8:
=================================================================================
Vertex Declaration: 0B98C860
=================================================================================
 8 Vertex Elements
{ Stream = 0, Offset = 0, Type = D3DDECLTYPE_FLOAT3, Method = D3DDECLMETHOD_DEFAULT, Usage = D3DDECLUSAGE_POSITION, UsageIndex = 0 },
{ Stream = 0, Offset = 12, Type = D3DDECLTYPE_FLOAT3, Method = D3DDECLMETHOD_DEFAULT, Usage = D3DDECLUSAGE_NORMAL, UsageIndex = 0 },
{ Stream = 0, Offset = 24, Type = D3DDECLTYPE_FLOAT3, Method = D3DDECLMETHOD_DEFAULT, Usage = D3DDECLUSAGE_BLENDWEIGHT, UsageIndex = 0 },
{ Stream = 0, Offset = 36, Type = D3DDECLTYPE_D3DCOLOR, Method = D3DDECLMETHOD_DEFAULT, Usage = D3DDECLUSAGE_BLENDINDICES, UsageIndex = 0 },
{ Stream = 0, Offset = 40, Type = D3DDECLTYPE_FLOAT4, Method = D3DDECLMETHOD_DEFAULT, Usage = D3DDECLUSAGE_TEXCOORD, UsageIndex = 0 },
{ Stream = 0, Offset = 56, Type = D3DDECLTYPE_FLOAT4, Method = D3DDECLMETHOD_DEFAULT, Usage = D3DDECLUSAGE_TEXCOORD, UsageIndex = 1 },
{ Stream = 0, Offset = 72, Type = D3DDECLTYPE_FLOAT4, Method = D3DDECLMETHOD_DEFAULT, Usage = D3DDECLUSAGE_TEXCOORD, UsageIndex = 2 },
{ Stream = 0, Offset = 88, Type = D3DDECLTYPE_FLOAT4, Method = D3DDECLMETHOD_DEFAULT, Usage = D3DDECLUSAGE_TEXCOORD, UsageIndex = 3 }
Vertex Shader 0FA01450:
=================================================================================
//--------------------------------------------------------------------------------------
// Automatically generated Vertex Shader.
//
// Copyright (c) Steve Segreto. All rights reserved.
// Shader Flags = 887f9
// Shader Type = Linear-Based Quaternion Skinning
// Shader Quality = PHONG_LIGHTING
//--------------------------------------------------------------------------------------
 
struct DirLight
{
    float4 ambient;
    float4 diffuse;
    float4 spec;
    float3 dirW;
    float4 fogColor;
    float3 lightPosW;
};
 
struct Mtrl
{
    float4 ambient;
    float4 diffuse;
    float4 spec;
    float  specPower;
    float4 emissive;
};
 
//--------------------------------------------------------------------------------------
// Macro defines
//--------------------------------------------------------------------------------------
#define MATRIX_PALETTE_SIZE (13)
 
//--------------------------------------------------------------------------------------
// Global variables
//--------------------------------------------------------------------------------------
uniform extern DirLight gLight;
uniform extern Mtrl gMtrl;
uniform extern float4x4 gWorld;
uniform extern float4x4 gWVP;
uniform extern float4x4 gInvWorld;
uniform extern float4x4 gView;
uniform extern float3 gEyePosW;
uniform extern float gFarClipDist;
uniform extern float gAlphaRef = 0.29f;
uniform extern float gFogRange = 250.0f;
uniform extern float gFogStart = 1.0f;
uniform extern matrix amPalette[ MATRIX_PALETTE_SIZE ];
uniform extern float gNumBones;
 
//----------------------------------------------------------------------------
// Shader body - VS_Skin
//----------------------------------------------------------------------------
 
//
// Define the inputs -- caller must fill this, usually right from the VB.
//
struct VS_SKIN_INPUT
{
    float4 vPos;
    float3 vNor;
    float3 vBlendWeights;
    float4 vBlendIndices;
};
 
//
// Return skinned position and normal
//
struct VS_SKIN_OUTPUT
{
    float4 vPos;
    float3 vNor;
};
 
//
// Call this function to skin VB position and normal.
//
VS_SKIN_OUTPUT VS_Skin( const VS_SKIN_INPUT vInput, int iNumBones )
{
    VS_SKIN_OUTPUT vOutput = (VS_SKIN_OUTPUT) 0;
 
    float fLastWeight = 1.0;
    float afBlendWeights[ 3 ] = (float[ 3 ]) vInput.vBlendWeights;
    int aiIndices[ 4 ]        = (int[ 4 ])   D3DCOLORtoUBYTE4( vInput.vBlendIndices );
 
    for( int iBone = 0; (iBone < 3) && (iBone < iNumBones - 1); ++ iBone )
    {
        float fWeight = afBlendWeights[ iBone ];
        fLastWeight -= fWeight;
        vOutput.vPos.xyz += mul( vInput.vPos, amPalette[ aiIndices[ iBone  ] ] ) * fWeight;
        vOutput.vNor     += mul( float4(vInput.vNor, 0.0f), amPalette[ aiIndices[ iBone  ] ] ) * fWeight;
    }
 
    vOutput.vPos.xyz += mul( vInput.vPos, amPalette[ aiIndices[ iNumBones - 1 ] ] ) * fLastWeight;
    vOutput.vNor     += mul( float4(vInput.vNor, 0.0f), amPalette[ aiIndices[ iNumBones - 1 ] ] ) * fLastWeight;
 
    return vOutput;
}
struct VS_in
{
    float3 posL         : POSITION0;
    float3 normalL      : NORMAL0;
    float3 BlendWeights : BLENDWEIGHT;
    float4 BlendIndices : BLENDINDICES;
    float4 tex0_tex1    : TEXCOORD0;
    float4 tex2_tex3    : TEXCOORD1;
    float4 tex4_tex5    : TEXCOORD2;
    float4 tex6_tex7    : TEXCOORD3;
};
 
struct VS_out
{
    float4 posH         : POSITION0;
    float4 tex0_tex1    : TEXCOORD0;
    float4 tex2_tex3    : TEXCOORD1;
    float4 tex4_tex5    : TEXCOORD2;
    float4 tex6_tex7    : TEXCOORD3;
    float3 normalW      : TEXCOORD4;
    float4 posVS        : TEXCOORD5;
    float4 color        : COLOR0;
    float  fogLerpParam : COLOR1;
};
 
VS_out VS_Scene( VS_in i )
{
    //
    // Zero out our output.
    //
    VS_out o = (VS_out)0;
 
    //
    // Skin VB inputs
    //
    VS_SKIN_INPUT  vsi = { float4( i.posL, 1.0f ), i.normalL, i.BlendWeights, i.BlendIndices };
    VS_SKIN_OUTPUT vso = VS_Skin( vsi, gNumBones );
    i.posL = vso.vPos.xyz;
    i.normalL = vso.vNor;
 
    //
    // Transform normal to world space and pass along
    // to be interpolated by rasterizer.
    //
    o.normalW = mul( gInvWorld, float4(i.normalL, 0) ).xyz;
 
    //
    // Pass along per-vertex color to be interpolated by rasterizer.
    //
    o.color = gMtrl.diffuse;
 
    //
    // Transform position to homogeneous clip space.
    //
    float4 vPositionVS = mul(float4(i.posL, 1.0f), mul(gWorld, gView));
    o.posH = mul(float4(i.posL, 1.0f), gWVP);
 
    //
    // This position will be used to output view space depth.
    //
    o.posVS = vPositionVS;
    o.posVS.z = max(o.posVS.z, 0.0f);
 
    //
    // Pass on texture coordinates to be interpolated in rasterization.
    //
    o.tex0_tex1.xy = i.tex0_tex1.xy;
    o.tex0_tex1.zw = i.tex0_tex1.zw;
    o.tex2_tex3.xy = i.tex2_tex3.xy;
    o.tex2_tex3.zw = i.tex2_tex3.zw;
    o.tex4_tex5.xy = i.tex4_tex5.xy;
    o.tex4_tex5.zw = i.tex4_tex5.zw;
    o.tex6_tex7.xy = i.tex6_tex7.xy;
    o.tex6_tex7.zw = i.tex6_tex7.zw;
 
    //
    // Compute vertex distance from camera in world
    // space for fog calculation.
    //
    float dist = distance(mul(float4(i.posL, 1.0f), gWorld).xyz, gEyePosW);
    o.fogLerpParam = saturate((dist - gFogStart) / gFogRange);
 
    //
    // Done--return the output.
    //
    return o;
}
Pixel Shader 0FA11C80:
=================================================================================
//--------------------------------------------------------------------------------------
// Automatically generated Pixel Shader.
//
// Copyright (c) Steve Segreto. All rights reserved.
// Shader Flags = 827e8
// Shader Type = Linear-Based Quaternion Skinning
// Shader Quality = PHONG_LIGHTING
//--------------------------------------------------------------------------------------
 
struct DirLight
{
    float4 ambient;
    float4 diffuse;
    float4 spec;
    float3 dirW;
    float4 fogColor;
    float3 lightPosW;
};
 
struct Mtrl
{
    float4 ambient;
    float4 diffuse;
    float4 spec;
    float  specPower;
    float4 emissive;
};
 
//--------------------------------------------------------------------------------------
// Macro defines
//--------------------------------------------------------------------------------------
 
//--------------------------------------------------------------------------------------
// Global variables
//--------------------------------------------------------------------------------------
uniform extern DirLight gLight;
uniform extern Mtrl gMtrl;
uniform extern float4x4 gInvWorld;
uniform extern float4x4 gView;
uniform extern float3 gEyePosW;
uniform extern float gFarClipDist;
uniform extern float gAlphaRef = 0.29f;
uniform extern float3 gFogColor;
uniform extern texture gTex0;
 
struct PS_in
{
    float4 tex0_tex1    : TEXCOORD0;
    float4 tex2_tex3    : TEXCOORD1;
    float4 tex4_tex5    : TEXCOORD2;
    float4 tex6_tex7    : TEXCOORD3;
    float3 normalW      : TEXCOORD4;
    float4 posVS        : TEXCOORD5;
    float4 color        : COLOR0;
    float  fogLerpParam : COLOR1;
};
 
struct PS_out
{
    float4 vMaterial    : COLOR0;
    float4 vWorldNrm    : COLOR1;
    float4 vEmittance   : COLOR2;
    float4 vDepth       : COLOR3;
};
 
sampler TexS0 = sampler_state
{
    Texture   = <gTex0>;
    MinFilter = Linear;
    MagFilter = Linear;
    MipFilter = Point;
    AddressU  = Wrap;
    AddressV  = Wrap;
};
 
PS_out PS_Scene( PS_in i )
{
    //
    // Zero out our output.
    //
    PS_out o = (PS_out)0;
 
    //
    // Interpolated normals can become unnormal.
    //
    i.normalW   = normalize(i.normalW);
 
    //
    // VERT_MODE_SRC_IGNORE
    //
    float3 matAmbient  = gMtrl.ambient.rgb;
    float4 matDiffuse  = gMtrl.diffuse;
    float3 matEmissive = gMtrl.emissive.rgb;
 
    //
    // Incoming colors.
    //
    float3 color_stage0 = saturate((matAmbient * gLight.ambient) + matDiffuse + matEmissive);
    o.vEmittance.y = gMtrl.spec.r;
    o.vEmittance.z = gMtrl.specPower;
    float  alpha_stage0 = matDiffuse.a;
 
    //
    // Sample textures.
    //
    float4 color0 = tex2D(TexS0, i.tex0_tex1.xy);
 
    //
    // Apply texturing stages
    //
 
    //
    // Diffuse map.
    //
    float3 color_stage1  = color_stage0 * color0.rgb;
 
    //
    // Final (pre-fog) color.
    //
    float4 texColor = float4( color_stage1.rgb, alpha_stage0 );
 
    //
    // Add fog
    //
    o.vMaterial = texColor;
    o.vEmittance.w = i.fogLerpParam;
    // convert normal to texture space [-1;+1] -> [0;1]
    o.vWorldNrm.xyz = i.normalW * 0.5 + 0.5;
 
    // post-perspective z/w depth
    o.vDepth = i.posVS.z / gFarClipDist;
 
    //
    // Done--return the output.
    //
    return o;
}

Share this post


Link to post
Share on other sites

1) That one is probably a little hardware specific. Constant buffers are geared more towards access patterns that involve all the threads accessing the same piece of data on the same instruction. So gBoneMatrices[7] would be fine, but gBoneMatrices[input.mBoneIndices.z] is not, since the compiler(s) can't resolve what the bone index will be at compile time.

 

Strictly speaking I believe 'tbuffers' are designed more for non-coherent random access, however I've never actually seen anyone use one (except in the Skinning10 sample, referenced here). In practice I'd probably opt for a Buffer<float4> or StructuredBuffer<float4x4> and use those instead. Buffer<float4> would let you compress your matrices to a more compact format if you can get away with it, also note you may only need 3 float4s per bone unless you're doing non-standard things. Check the last row and see if it's always the same across all bones.

 

2) With a constant buffer you're constrained to 1024 bones (4096 float4s per CB), but with a Buffer/StructuredBuffer/tbuffer you effectively have no limit. If you opt for any of the last three options you can just create the buffer to the exact size for how many bones you have.

 

3) I think it's just a reasonable trade-off between quality, performance and the fact that a single vertex attribute can only store 4 indices. If you want more you add another attribute and can now index 8, which is probably more than necessary. I expect there's legacy reasons for it, but you should/can support as many as you feel necessary.

 

4) Whether branching over a single matrix multiplication is worth it or not is again hardware and situation specific.These days I can certainly imagine that skipping zero weight bones would be worth it, but the easiest thing to do is try it on 2 (or ideally all 3) vendor's recent hardware and see what results you get with and without the branches.

 

1) and 2): It does indeed seem StructuredBuffer would be better due to random access.

 

Some additional references I found on the subject;

https://developer.nvidia.com/content/redundancy-and-latency-structured-buffer-use

http://www.gamedev.net/topic/624529-structured-buffers-vs-constant-buffers/

 

 

4): see below

 

 

Hi this is a great list of questions!

 

In my experience constants buffer are the best way in dx9, 10 and 11 to send the updated bone transforms to the shader for the vertex skinning. A buffer of about 50 or less 4x4 matrices should be able to handle most (> 90%) of the low to mid class game engine skinned actors in one draw call (I can confirm this for Dark Age of Camelot, Unreal Tournament, Mortal Kombat Armageddon, Zelda Twilight Princess and TitanQuest skinned actors).

 

As for the 4 bone weights per vertex limit, this does seem to be an arbitrary legacy based limit from shader model 2 and the fixed function before it when the blend indices were packed into 4 dwords. I think I saw some research paper that concluded for bipedal meshes 4 blend indices is sufficient for a certain fidelity of realistic motion, but I can't find the paper at hand to link to it.

 

As for the last question of whether you should always blend in 4 bone weights, I actually use another number to tell me how many of the 4 blend indices are needed. Here is some example shader model 2 code for skinning:

CRenderableMesh 0F7B78E8:
=================================================================================
Vertex Declaration: 0B98C860
=================================================================================
 8 Vertex Elements
{ Stream = 0, Offset = 0, Type = D3DDECLTYPE_FLOAT3, Method = D3DDECLMETHOD_DEFAULT, Usage = D3DDECLUSAGE_POSITION, UsageIndex = 0 },
{ Stream = 0, Offset = 12, Type = D3DDECLTYPE_FLOAT3, Method = D3DDECLMETHOD_DEFAULT, Usage = D3DDECLUSAGE_NORMAL, UsageIndex = 0 },
{ Stream = 0, Offset = 24, Type = D3DDECLTYPE_FLOAT3, Method = D3DDECLMETHOD_DEFAULT, Usage = D3DDECLUSAGE_BLENDWEIGHT, UsageIndex = 0 },
{ Stream = 0, Offset = 36, Type = D3DDECLTYPE_D3DCOLOR, Method = D3DDECLMETHOD_DEFAULT, Usage = D3DDECLUSAGE_BLENDINDICES, UsageIndex = 0 },
{ Stream = 0, Offset = 40, Type = D3DDECLTYPE_FLOAT4, Method = D3DDECLMETHOD_DEFAULT, Usage = D3DDECLUSAGE_TEXCOORD, UsageIndex = 0 },
{ Stream = 0, Offset = 56, Type = D3DDECLTYPE_FLOAT4, Method = D3DDECLMETHOD_DEFAULT, Usage = D3DDECLUSAGE_TEXCOORD, UsageIndex = 1 },
{ Stream = 0, Offset = 72, Type = D3DDECLTYPE_FLOAT4, Method = D3DDECLMETHOD_DEFAULT, Usage = D3DDECLUSAGE_TEXCOORD, UsageIndex = 2 },
{ Stream = 0, Offset = 88, Type = D3DDECLTYPE_FLOAT4, Method = D3DDECLMETHOD_DEFAULT, Usage = D3DDECLUSAGE_TEXCOORD, UsageIndex = 3 }
Vertex Shader 0FA01450:
=================================================================================
//--------------------------------------------------------------------------------------
// Automatically generated Vertex Shader.
//
// Copyright (c) Steve Segreto. All rights reserved.
// Shader Flags = 887f9
// Shader Type = Linear-Based Quaternion Skinning
// Shader Quality = PHONG_LIGHTING
//--------------------------------------------------------------------------------------
 
struct DirLight
{
    float4 ambient;
    float4 diffuse;
    float4 spec;
    float3 dirW;
    float4 fogColor;
    float3 lightPosW;
};
 
struct Mtrl
{
    float4 ambient;
    float4 diffuse;
    float4 spec;
    float  specPower;
    float4 emissive;
};
 
//--------------------------------------------------------------------------------------
// Macro defines
//--------------------------------------------------------------------------------------
#define MATRIX_PALETTE_SIZE (13)
 
//--------------------------------------------------------------------------------------
// Global variables
//--------------------------------------------------------------------------------------
uniform extern DirLight gLight;
uniform extern Mtrl gMtrl;
uniform extern float4x4 gWorld;
uniform extern float4x4 gWVP;
uniform extern float4x4 gInvWorld;
uniform extern float4x4 gView;
uniform extern float3 gEyePosW;
uniform extern float gFarClipDist;
uniform extern float gAlphaRef = 0.29f;
uniform extern float gFogRange = 250.0f;
uniform extern float gFogStart = 1.0f;
uniform extern matrix amPalette[ MATRIX_PALETTE_SIZE ];
uniform extern float gNumBones;
 
//----------------------------------------------------------------------------
// Shader body - VS_Skin
//----------------------------------------------------------------------------
 
//
// Define the inputs -- caller must fill this, usually right from the VB.
//
struct VS_SKIN_INPUT
{
    float4 vPos;
    float3 vNor;
    float3 vBlendWeights;
    float4 vBlendIndices;
};
 
//
// Return skinned position and normal
//
struct VS_SKIN_OUTPUT
{
    float4 vPos;
    float3 vNor;
};
 
//
// Call this function to skin VB position and normal.
//
VS_SKIN_OUTPUT VS_Skin( const VS_SKIN_INPUT vInput, int iNumBones )
{
    VS_SKIN_OUTPUT vOutput = (VS_SKIN_OUTPUT) 0;
 
    float fLastWeight = 1.0;
    float afBlendWeights[ 3 ] = (float[ 3 ]) vInput.vBlendWeights;
    int aiIndices[ 4 ]        = (int[ 4 ])   D3DCOLORtoUBYTE4( vInput.vBlendIndices );
 
    for( int iBone = 0; (iBone < 3) && (iBone < iNumBones - 1); ++ iBone )
    {
        float fWeight = afBlendWeights[ iBone ];
        fLastWeight -= fWeight;
        vOutput.vPos.xyz += mul( vInput.vPos, amPalette[ aiIndices[ iBone  ] ] ) * fWeight;
        vOutput.vNor     += mul( float4(vInput.vNor, 0.0f), amPalette[ aiIndices[ iBone  ] ] ) * fWeight;
    }
 
    vOutput.vPos.xyz += mul( vInput.vPos, amPalette[ aiIndices[ iNumBones - 1 ] ] ) * fLastWeight;
    vOutput.vNor     += mul( float4(vInput.vNor, 0.0f), amPalette[ aiIndices[ iNumBones - 1 ] ] ) * fLastWeight;
 
    return vOutput;
}
struct VS_in
{
    float3 posL         : POSITION0;
    float3 normalL      : NORMAL0;
    float3 BlendWeights : BLENDWEIGHT;
    float4 BlendIndices : BLENDINDICES;
    float4 tex0_tex1    : TEXCOORD0;
    float4 tex2_tex3    : TEXCOORD1;
    float4 tex4_tex5    : TEXCOORD2;
    float4 tex6_tex7    : TEXCOORD3;
};
 
struct VS_out
{
    float4 posH         : POSITION0;
    float4 tex0_tex1    : TEXCOORD0;
    float4 tex2_tex3    : TEXCOORD1;
    float4 tex4_tex5    : TEXCOORD2;
    float4 tex6_tex7    : TEXCOORD3;
    float3 normalW      : TEXCOORD4;
    float4 posVS        : TEXCOORD5;
    float4 color        : COLOR0;
    float  fogLerpParam : COLOR1;
};
 
VS_out VS_Scene( VS_in i )
{
    //
    // Zero out our output.
    //
    VS_out o = (VS_out)0;
 
    //
    // Skin VB inputs
    //
    VS_SKIN_INPUT  vsi = { float4( i.posL, 1.0f ), i.normalL, i.BlendWeights, i.BlendIndices };
    VS_SKIN_OUTPUT vso = VS_Skin( vsi, gNumBones );
    i.posL = vso.vPos.xyz;
    i.normalL = vso.vNor;
 
    //
    // Transform normal to world space and pass along
    // to be interpolated by rasterizer.
    //
    o.normalW = mul( gInvWorld, float4(i.normalL, 0) ).xyz;
 
    //
    // Pass along per-vertex color to be interpolated by rasterizer.
    //
    o.color = gMtrl.diffuse;
 
    //
    // Transform position to homogeneous clip space.
    //
    float4 vPositionVS = mul(float4(i.posL, 1.0f), mul(gWorld, gView));
    o.posH = mul(float4(i.posL, 1.0f), gWVP);
 
    //
    // This position will be used to output view space depth.
    //
    o.posVS = vPositionVS;
    o.posVS.z = max(o.posVS.z, 0.0f);
 
    //
    // Pass on texture coordinates to be interpolated in rasterization.
    //
    o.tex0_tex1.xy = i.tex0_tex1.xy;
    o.tex0_tex1.zw = i.tex0_tex1.zw;
    o.tex2_tex3.xy = i.tex2_tex3.xy;
    o.tex2_tex3.zw = i.tex2_tex3.zw;
    o.tex4_tex5.xy = i.tex4_tex5.xy;
    o.tex4_tex5.zw = i.tex4_tex5.zw;
    o.tex6_tex7.xy = i.tex6_tex7.xy;
    o.tex6_tex7.zw = i.tex6_tex7.zw;
 
    //
    // Compute vertex distance from camera in world
    // space for fog calculation.
    //
    float dist = distance(mul(float4(i.posL, 1.0f), gWorld).xyz, gEyePosW);
    o.fogLerpParam = saturate((dist - gFogStart) / gFogRange);
 
    //
    // Done--return the output.
    //
    return o;
}
Pixel Shader 0FA11C80:
=================================================================================
//--------------------------------------------------------------------------------------
// Automatically generated Pixel Shader.
//
// Copyright (c) Steve Segreto. All rights reserved.
// Shader Flags = 827e8
// Shader Type = Linear-Based Quaternion Skinning
// Shader Quality = PHONG_LIGHTING
//--------------------------------------------------------------------------------------
 
struct DirLight
{
    float4 ambient;
    float4 diffuse;
    float4 spec;
    float3 dirW;
    float4 fogColor;
    float3 lightPosW;
};
 
struct Mtrl
{
    float4 ambient;
    float4 diffuse;
    float4 spec;
    float  specPower;
    float4 emissive;
};
 
//--------------------------------------------------------------------------------------
// Macro defines
//--------------------------------------------------------------------------------------
 
//--------------------------------------------------------------------------------------
// Global variables
//--------------------------------------------------------------------------------------
uniform extern DirLight gLight;
uniform extern Mtrl gMtrl;
uniform extern float4x4 gInvWorld;
uniform extern float4x4 gView;
uniform extern float3 gEyePosW;
uniform extern float gFarClipDist;
uniform extern float gAlphaRef = 0.29f;
uniform extern float3 gFogColor;
uniform extern texture gTex0;
 
struct PS_in
{
    float4 tex0_tex1    : TEXCOORD0;
    float4 tex2_tex3    : TEXCOORD1;
    float4 tex4_tex5    : TEXCOORD2;
    float4 tex6_tex7    : TEXCOORD3;
    float3 normalW      : TEXCOORD4;
    float4 posVS        : TEXCOORD5;
    float4 color        : COLOR0;
    float  fogLerpParam : COLOR1;
};
 
struct PS_out
{
    float4 vMaterial    : COLOR0;
    float4 vWorldNrm    : COLOR1;
    float4 vEmittance   : COLOR2;
    float4 vDepth       : COLOR3;
};
 
sampler TexS0 = sampler_state
{
    Texture   = <gTex0>;
    MinFilter = Linear;
    MagFilter = Linear;
    MipFilter = Point;
    AddressU  = Wrap;
    AddressV  = Wrap;
};
 
PS_out PS_Scene( PS_in i )
{
    //
    // Zero out our output.
    //
    PS_out o = (PS_out)0;
 
    //
    // Interpolated normals can become unnormal.
    //
    i.normalW   = normalize(i.normalW);
 
    //
    // VERT_MODE_SRC_IGNORE
    //
    float3 matAmbient  = gMtrl.ambient.rgb;
    float4 matDiffuse  = gMtrl.diffuse;
    float3 matEmissive = gMtrl.emissive.rgb;
 
    //
    // Incoming colors.
    //
    float3 color_stage0 = saturate((matAmbient * gLight.ambient) + matDiffuse + matEmissive);
    o.vEmittance.y = gMtrl.spec.r;
    o.vEmittance.z = gMtrl.specPower;
    float  alpha_stage0 = matDiffuse.a;
 
    //
    // Sample textures.
    //
    float4 color0 = tex2D(TexS0, i.tex0_tex1.xy);
 
    //
    // Apply texturing stages
    //
 
    //
    // Diffuse map.
    //
    float3 color_stage1  = color_stage0 * color0.rgb;
 
    //
    // Final (pre-fog) color.
    //
    float4 texColor = float4( color_stage1.rgb, alpha_stage0 );
 
    //
    // Add fog
    //
    o.vMaterial = texColor;
    o.vEmittance.w = i.fogLerpParam;
    // convert normal to texture space [-1;+1] -> [0;1]
    o.vWorldNrm.xyz = i.normalW * 0.5 + 0.5;
 
    // post-perspective z/w depth
    o.vDepth = i.posVS.z / gFarClipDist;
 
    //
    // Done--return the output.
    //
    return o;
}
uniform extern float gNumBones;

// ...

VS_SKIN_OUTPUT VS_Skin( const VS_SKIN_INPUT vInput, int iNumBones )
{
    VS_SKIN_OUTPUT vOutput = (VS_SKIN_OUTPUT) 0;
 
    float fLastWeight = 1.0;
    float afBlendWeights[ 3 ] = (float[ 3 ]) vInput.vBlendWeights;
    int aiIndices[ 4 ]        = (int[ 4 ])   D3DCOLORtoUBYTE4( vInput.vBlendIndices );
 
    for( int iBone = 0; (iBone < 3) && (iBone < iNumBones - 1); ++ iBone )
    {
        float fWeight = afBlendWeights[ iBone ];
        fLastWeight -= fWeight;
        vOutput.vPos.xyz += mul( vInput.vPos, amPalette[ aiIndices[ iBone  ] ] ) * fWeight;
        vOutput.vNor     += mul( float4(vInput.vNor, 0.0f), amPalette[ aiIndices[ iBone  ] ] ) * fWeight;
    }
 
    vOutput.vPos.xyz += mul( vInput.vPos, amPalette[ aiIndices[ iNumBones - 1 ] ] ) * fLastWeight;
    vOutput.vNor     += mul( float4(vInput.vNor, 0.0f), amPalette[ aiIndices[ iNumBones - 1 ] ] ) * fLastWeight;
 
    return vOutput;
}

// ...

VS_SKIN_OUTPUT vso = VS_Skin( vsi, gNumBones );
gNumBones is a uniform and so set once per draw call - what if only a few vertices used four bones but the majority less? It would still compute up to gNumBones for all vertices. Does it make sense to pack the gNumBones as a vertex attribute instead?
 
Also does a variable value per vertex avoid the problem of branching? Or is a fixed iteration value better as the compiler could just unroll the loop?

Share this post


Link to post
Share on other sites

gNumBones is a uniform and so set once per draw call - what if only a few vertices used four bones but the majority less? It would still compute up to gNumBones for all vertices. Does it make sense to pack the gNumBones as a vertex attribute instead?

 
Also does a variable value per vertex avoid the problem of branching? Or is a fixed iteration value better as the compiler could just unroll the loop?

 

 

That's correct, if you use a single shader constant it'll loop exactly that many times for all vertices. A more accurate name go gNumBones would be "gMaxBones", where the value means "the most number of bones any vertex will reference in this draw". If you support up to 4 bones like the shader above and even a single vertex references 4 bones but all the rest reference 1, then you're paying the full cost to skin against 4 bones.
 
I would simply ensure that every bone has its indices and weights "left-packed" such that any unused bone indices/weights occur first in the W channel, then ZW, then YZW so that you can iterate through X -> W and stop as soon as you encounter a zero weight. Unfortunately I was getting what looked like less-than-ideal code-gen with a loop that breaks on encountering the first zero weight, but better code reformulated as a series of nested branches.
 
I like the "store 3 weights in 10 bits and infer the 4th weight" idea, so you can add that too if you wanted. Untested code!
StructuredBuffer<float4x4> bones;

struct VSINPUT
{
	float4 position : POSITION;
	uint4 boneIndices : INDICES;
	float4 boneWeights : WEIGHTS;
};

float4 ApplyBone(float4 inputPosition, uint boneIndex, float weight)
{
	return mul(inputPosition, bones[boneIndex]) * weight;
}

float4 main(in VSINPUT input) : SV_POSITION
{
	float4 pos = ApplyBone(input.position, input.boneIndices[0], input.boneWeights[0]);
	
	if(input.boneWeights[1] > 0)
	{
		pos += ApplyBone(input.position, input.boneIndices[1], input.boneWeights[1]);
		
		if(input.boneWeights[2] > 0)
		{
			pos += ApplyBone(input.position, input.boneIndices[2], input.boneWeights[2]);
			
			if(input.boneWeights[3] > 0)
			{
				pos += ApplyBone(input.position, input.boneIndices[3], input.boneWeights[3]);
			}
		}
	}
	
	return pos;
}

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!