Animation theading

Started by
17 comments, last by Alundra 10 years, 5 months ago
On the CPU side, are you using SSE2 (at minimum) for matrix multiplication?

I have 0 SSE code in my math code actually, I should add that a day.

Advertisement

Can you clarify if you're CPU-bound or GPU-bound?

It sounds like you are CPU-bound (calculating the bone matrices?) - but some folks are offering shader optimizations, which won't make a difference if you're being limited on the CPU.

Final transform matrix is computed on the CPU, my GPU part is only the vertex shader I have showed.

Profile it, please.

void hurrrrrrrr() {__asm sub [ebp+4],5;}

There are ten kinds of people in this world: those who understand binary and those who don't.

I have profile using very sleepy during 1min and we can see that operator* of the matrix is the heaviest function called on the list :

CMatrix4::operator* = 2.43s (exclusive)

the second on the list is QuaternionSlerp :

QuaternionSlerp = 1.57s (exclusive)

QuaternionSlerp could be replaced by QuaternionNLerp only I think, I do a check inside to do a NLerp :


const float CosPhi = QuaternionDot( q1, NewQ2 );
if( CosPhi > ( 1.0f - 0.001f ) )

but since the most of time angle is low, this check could be removed and go just for QuaternionNLerp.

My actual performance is on a map of 250 000 triangles with 10 characters animated of 50 bones with textures and directional lighting : 530 FPS.

How many calls were there to CMatrix4::operator*?

It's taking up 4.05% of your CPU time, so it almost sounds like... *drumroll, please* you should be offloading a lot of that work to the GPU through the use of vertex shaders.

Can you walk us step-by-step through the process you use to render a single entity once?

void hurrrrrrrr() {__asm sub [ebp+4],5;}

There are ten kinds of people in this world: those who understand binary and those who don't.

I think it's not here that the heaviest code is, my send of data for a skinned to the GPU has nothing fancy :

1) Bind VertexBuffer/IndexBuffer

2) Bind VertexShader

3) Update constant buffer, on this part I do InverseBindPose*FinalTransform[ i ].

4) for loop of material subset

5) Check if we have a material

6) Bind PixelShader

7) Update constant buffer/textures

8) Draw the subset


My actual performance is on a map of 250 000 triangles with 10 characters animated of 50 bones with textures and directional lighting : 530 FPS.

530FPS! What makes you think you have a performance problem? What are your performance goals?

The matrix operations you listed are taking up 4% of your CPU time. Assuming that's all in the bone calculations, and you were able to successfully divide the work onto 4 cores, it would now be taking 1% of your CPU time. So the CPU time for a frame is now 97% of what it used to be (i.e. of questionable benefit considering the added complexity).

If you're really worried your CPU bone matrix calculations are a performance issue, make a build where you can turn them off at will (i.e. just not update them each frame). Does it affect performance?

I just would tried to make some threading to see how that works, so I gave an idea of threading but I don't know where the best place is for threading an animation system.

About threading, using SSE2 and doesn't use operator* and use a function with a pointer to a matrix can win performance too.

I have to say too that operator* is used in Actor::Update so a boolean to avoid update of transform when not needed need to be added.

This topic is closed to new replies.

Advertisement