Back to General and Gameplay Programming

Animation theading

General and Gameplay Programming Programming

Started by Alundra October 21, 2013 02:38 AM

17 comments, last by Alundra 10 years, 5 months ago

Alundra

2,325

Author

October 21, 2013 02:38 AM

Hi all,

I have a class Animator who is responsible to animate one mesh.

Each animated mesh in the scene has an animator so.

Animation take a lot of performance so threading need to be added.

Since I'm new in threading, my idea is to have an AnimatorManager, each instance is created using it.

The AnimatorManager has an array of Animator* and an array of thread who is the num of Core of the CPU.

The AnimatorManager will have a function Update who will update all Animator* from the array using threading.

Is it the good way or a better way need to be used ?

Thanks for the help

TheComet

3,927

October 21, 2013 09:23 AM

What graphics library are you using?

I would recommend implementing GPU accelerated animation rather than messing around with CPU threading. This way, the transformations of animation frames are offloaded to the GPU using vertex programs and immensly increase performance.

Ogre3D supports this, for instance: http://www.ogre3d.org/docs/manual/manual_76.html

"I would try to find halo source code by bungie best fps engine ever created, u see why call of duty loses speed due to its detail." -- GettingNifty

Alundra

2,325

Author

October 21, 2013 10:58 AM

I already do GPU skinning using vertex shader (HLSL and GLSL).

The animation system is a layered based system, each AnimatedMesh component contains an animator.

GPU Skinning help a lot on performance but it's not enough, it's why threading is needed, to split all update.

I'm new in threading so I don't know a lot about all that, my idea is just to have a manager and update that by threading.

L. Spiro

25,818

October 21, 2013 10:20 PM

Perhaps you are not utilizing the GPU properly; normally the GPU is plenty fast enough to handle skinning, but there are more than one ways to lose performance.

Beating the GPU by using threads is non-trivial and can only happen if you really know what you are doing—it is not a learning point.

Make sure you are actually using the GPU correctly before you try to use threading to gain performance. You may be inclined to post your vertex shader(s).

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Alundra

2,325

Author

October 22, 2013 10:19 AM

My vertex shader used for meshes who need skinning :


struct VS_INPUT
{
  float4 Position : POSITION;
  float3 Normal   : NORMAL;
  float2 TexCoord : TEXCOORD0;
  float4 Tangent  : TANGENT;
  float4 Weights  : WEIGHTS;
  uint4  Indices  : BONEINDICES;
};

struct VS_OUTPUT
{
  float4 Position : SV_POSITION;
  float3 Normal   : NORMAL;
  float4 Tangent  : TANGENT;
  float2 TexCoord : TEXCOORD0;
  float4 PosVS    : TEXCOORD1;
};

cbuffer WVP_WVIT_CBUFFER : register( b0 )
{
  float4x4 WorldView;
  float4x4 Projection;
  float4x4 WorldViewInverseTranspose;
};

cbuffer MESH_SKINNED_CBUFFER : register( b1 )
{
  float4x4 BoneMatrices[ 96 ];
};

VS_OUTPUT main( in VS_INPUT Input )
{
  VS_OUTPUT Output = (VS_OUTPUT)0;
  float4 SkinnedPos = float4( 0.0f, 0.0f, 0.0f, 0.0f );
  float3 SkinnedNormal = float3( 0.0f, 0.0f, 0.0f );
  float3 SkinnedTangent = float3( 0.0f, 0.0f, 0.0f );
  for( int i = 0; i < 4; ++i )
  {
    if( Input.Weights[ i ] > 0.0f )
    {
      SkinnedPos += mul( Input.Position, BoneMatrices[ Input.Indices[ i ] ] ) * Input.Weights[ i ];
      SkinnedNormal += mul( Input.Normal, (float3x3)BoneMatrices[ Input.Indices[ i ] ] ) * Input.Weights[ i ];
      SkinnedTangent += mul( Input.Tangent.xyz, (float3x3)BoneMatrices[ Input.Indices[ i ] ] ) * Input.Weights[ i ];
    }
  }
  float4 ViewPosition = mul( SkinnedPos, WorldView );
  Output.Position = mul( ViewPosition, Projection );
  Output.Normal = mul( SkinnedNormal, (float3x3)WorldViewInverseTranspose );
  Output.Tangent = float4( mul( SkinnedTangent, (float3x3)WorldView ), Input.Tangent.w );
  Output.TexCoord = Input.TexCoord;
  Output.PosVS = ViewPosition;
  return Output;
}

AgentC

2,476

October 22, 2013 12:31 PM

What graphics library are you using?

I would recommend implementing GPU accelerated animation rather than messing around with CPU threading. This way, the transformations of animation frames are offloaded to the GPU using vertex programs and immensly increase performance.

Ogre3D supports this, for instance: http://www.ogre3d.org/docs/manual/manual_76.html

Even if the GPU handles actual vertex skinning according to the bone matrices it is given, the calculations to yield the final bone matrices (sample and blend between keyframes, blend if multiple animations, transform local-space bone transforms into the world space, multiply with inverse bind pose) can easily be a per-frame CPU hotspot if there are many (50+) characters onscreen, and thus will benefit from threading.

In addition to just threading the animation work, workload can also be reduced:

- Make sure you're not calculating animation for characters outside view frustum

- When characters are far away, you can get away with not updating the animation every frame (a primitive form of LOD)

However if we're talking about only a few or a few tens of characters the CPU side of animation shouldn't be a significant hotspot.

Github: https://github.com/cadaver C64 development: http://covertbitops.c64.org/

L. Spiro

25,818

October 22, 2013 12:33 PM

Firstly, remove the branch from the for loop. Iterate over all 4 weights regardless of them being 0 or not. Negative weights should not be allowed by the CPU end.

Secondly, you only need to upload as many bones are as referenced by the part of the model you are rendering. For instance, a mech-machine will likely be broken into 1 mesh for each leg, 1 or 2 or so for the body, some for the weapons, etc.
You aren’t rendering the entire model all in one pass, but in multiple passes in which smaller parts of the model are rendered at a time. If you are rendering the front-left leg, there is no reason to send bone information for the back-right leg. Reducing the number of bones you send reduces bandwidth heavily and will be one of the largest gains in performance you will see.

The rest of my suggestions may be exactly the same performance or may be faster, so you would have to test. The shader compiler will likely be smart enough not to perform array look-ups every time, but you can be sure by storing Input.Weights[ i ] to a temporary and using that instead of repeated array access. Same thing with Input.Indices[ i ] and possibly even BoneMatrices[ Input.Indices[ i ] ].

Try various combinations of storing these to temporaries, benchmark, and repeat.

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

21st Century Moose

13,459

October 22, 2013 12:55 PM

You should also look at your bone matrix multiplication and upload code; it's possible that you may have bottlenecks there that are solvable without even having to consider threading as an option.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Alundra

2,325

Author

October 22, 2013 03:35 PM

Firstly, remove the branch from the for loop. Iterate over all 4 weights regardless of them being 0 or not. Negative weights should not be allowed by the CPU end.

I thought now branching was fast enough to avoid mul of matrix, thanks to give me the info that it still better to do that instead of a branch.

Is it the same for a diffuse texture, send a white texture to sample it instaead of a branch ?

Secondly, you only need to upload as many bones are as referenced by the part of the model you are rendering.

I already do that yea, each geometry of the mesh is split with a bone array inside and each geometry is split by material.

You should also look at your bone matrix multiplication and upload code; it's possible that you may have bottlenecks there that are solvable without even having to consider threading as an option.

I already don't update the final bone matrix array if no animation needs to be played.

L. Spiro

25,818

October 22, 2013 03:56 PM

I thought now branching was fast enough to avoid mul of matrix, thanks to give me the info that it still better to do that instead of a branch.
Is it the same for a diffuse texture, send a white texture to sample it instaead of a branch ?

It is very much worth testing.

On the CPU side, are you using SSE2 (at minimum) for matrix multiplication?

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Animation theading

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Animation theading

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines