This topic is 1729 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

Hi all,

I have a class Animator who is responsible to animate one mesh.

Each animated mesh in the scene has an animator so.

Animation take a lot of performance so threading need to be added.

Since I'm new in threading, my idea is to have an AnimatorManager, each instance is created using it.

The AnimatorManager has an array of Animator* and an array of thread who is the num of Core of the CPU.

The AnimatorManager will have a function Update who will update all Animator* from the array using threading.

Is it the good way or a better way need to be used ?

Thanks for the help

Share on other sites

What graphics library are you using?

I would recommend implementing GPU accelerated animation rather than messing around with CPU threading. This way, the transformations of animation frames are offloaded to the GPU using vertex programs and immensly increase performance.

Ogre3D supports this, for instance: http://www.ogre3d.org/docs/manual/manual_76.html

Share on other sites

The animation system is a layered based system, each AnimatedMesh component contains an animator.

GPU Skinning help a lot on performance but it's not enough, it's why threading is needed, to split all update.

I'm new in threading so I don't know a lot about all that, my idea is just to have a manager and update that by threading.

Share on other sites

Perhaps you are not utilizing the GPU properly; normally the GPU is plenty fast enough to handle skinning, but there are more than one ways to lose performance.

Beating the GPU by using threads is non-trivial and can only happen if you really know what you are doing—it is not a learning point.

Make sure you are actually using the GPU correctly before you try to use threading to gain performance.  You may be inclined to post your vertex shader(s).

L. Spiro

Share on other sites

My vertex shader used for meshes who need skinning :

struct VS_INPUT
{
float4 Position : POSITION;
float3 Normal   : NORMAL;
float2 TexCoord : TEXCOORD0;
float4 Tangent  : TANGENT;
float4 Weights  : WEIGHTS;
uint4  Indices  : BONEINDICES;
};

struct VS_OUTPUT
{
float4 Position : SV_POSITION;
float3 Normal   : NORMAL;
float4 Tangent  : TANGENT;
float2 TexCoord : TEXCOORD0;
float4 PosVS    : TEXCOORD1;
};

cbuffer WVP_WVIT_CBUFFER : register( b0 )
{
float4x4 WorldView;
float4x4 Projection;
float4x4 WorldViewInverseTranspose;
};

cbuffer MESH_SKINNED_CBUFFER : register( b1 )
{
float4x4 BoneMatrices[ 96 ];
};

VS_OUTPUT main( in VS_INPUT Input )
{
VS_OUTPUT Output = (VS_OUTPUT)0;
float4 SkinnedPos = float4( 0.0f, 0.0f, 0.0f, 0.0f );
float3 SkinnedNormal = float3( 0.0f, 0.0f, 0.0f );
float3 SkinnedTangent = float3( 0.0f, 0.0f, 0.0f );
for( int i = 0; i < 4; ++i )
{
if( Input.Weights[ i ] > 0.0f )
{
SkinnedPos += mul( Input.Position, BoneMatrices[ Input.Indices[ i ] ] ) * Input.Weights[ i ];
SkinnedNormal += mul( Input.Normal, (float3x3)BoneMatrices[ Input.Indices[ i ] ] ) * Input.Weights[ i ];
SkinnedTangent += mul( Input.Tangent.xyz, (float3x3)BoneMatrices[ Input.Indices[ i ] ] ) * Input.Weights[ i ];
}
}
float4 ViewPosition = mul( SkinnedPos, WorldView );
Output.Position = mul( ViewPosition, Projection );
Output.Normal = mul( SkinnedNormal, (float3x3)WorldViewInverseTranspose );
Output.Tangent = float4( mul( SkinnedTangent, (float3x3)WorldView ), Input.Tangent.w );
Output.TexCoord = Input.TexCoord;
Output.PosVS = ViewPosition;
return Output;
}

Share on other sites

What graphics library are you using?

I would recommend implementing GPU accelerated animation rather than messing around with CPU threading. This way, the transformations of animation frames are offloaded to the GPU using vertex programs and immensly increase performance.

Ogre3D supports this, for instance: http://www.ogre3d.org/docs/manual/manual_76.html

Even if the GPU handles actual vertex skinning according to the bone matrices it is given, the calculations to yield the final bone matrices (sample and blend between keyframes, blend if multiple animations, transform local-space bone transforms into the world space, multiply with inverse bind pose) can easily be a per-frame CPU hotspot if there are many (50+) characters onscreen, and thus will benefit from threading.

- Make sure you're not calculating animation for characters outside view frustum

- When characters are far away, you can get away with not updating the animation every frame (a primitive form of LOD)

However if we're talking about only a few or a few tens of characters the CPU side of animation shouldn't be a significant hotspot.

Edited by AgentC

Share on other sites

Firstly, remove the branch from the for loop.  Iterate over all 4 weights regardless of them being 0 or not.  Negative weights should not be allowed by the CPU end.

Secondly, you only need to upload as many bones are as referenced by the part of the model you are rendering.  For instance, a mech-machine will likely be broken into 1 mesh for each leg, 1 or 2 or so for the body, some for the weapons, etc.
You aren’t rendering the entire model all in one pass, but in multiple passes in which smaller parts of the model are rendered at a time.  If you are rendering the front-left leg, there is no reason to send bone information for the back-right leg.  Reducing the number of bones you send reduces bandwidth heavily and will be one of the largest gains in performance you will see.

The rest of my suggestions may be exactly the same performance or may be faster, so you would have to test.  The shader compiler will likely be smart enough not to perform array look-ups every time, but you can be sure by storing Input.Weights[ i ] to a temporary and using that instead of repeated array access.  Same thing with Input.Indices[ i ] and possibly even BoneMatrices[ Input.Indices[ i ] ].

Try various combinations of storing these to temporaries, benchmark, and repeat.

L. Spiro

Share on other sites

You should also look at your bone matrix multiplication and upload code; it's possible that you may have bottlenecks there that are solvable without even having to consider threading as an option.

Edited by mhagain

Share on other sites

Firstly, remove the branch from the for loop.  Iterate over all 4 weights regardless of them being 0 or not.  Negative weights should not be allowed by the CPU end.

I thought now branching was fast enough to avoid mul of matrix, thanks to give me the info that it still better to do that instead of a branch.

Is it the same for a diffuse texture, send a white texture to sample it instaead of a branch ?

Secondly, you only need to upload as many bones are as referenced by the part of the model you are rendering.

I already do that yea, each geometry of the mesh is split with a bone array inside and each geometry is split by material.

You should also look at your bone matrix multiplication and upload code; it's possible that you may have bottlenecks there that are solvable without even having to consider threading as an option.

I already don't update the final bone matrix array if no animation needs to be played.

Edited by Alundra

Share on other sites

I thought now branching was fast enough to avoid mul of matrix, thanks to give me the info that it still better to do that instead of a branch.
Is it the same for a diffuse texture, send a white texture to sample it instaead of a branch ?

It is very much worth testing.

On the CPU side, are you using SSE2 (at minimum) for matrix multiplication?

L. Spiro

1. 1
Rutin
25
2. 2
3. 3
JoeJ
19
4. 4
5. 5

• 10
• 11
• 9
• 9
• 10
• Forum Statistics

• Total Topics
631753
• Total Posts
3002098
×