GPU Skinning and frame interpolation

Started by
14 comments, last by theagentd 12 years, 3 months ago
Regarding the static vertex data, most of the implementations I've seen use a UByte*4 for the associated bone indicies and a UByte*4 for the weights for those indices. This limits each vertex to being associated with only 4 bones, and if a vertex is associated with less bones, then it also performs same math as if it were associated with 4 but it uses weights of 0.0 for the extra bones.
I've usually seen the dynamic/animated bone data represented as a 4x3 (or 3x4) matrix containing rotation/scale/translation transforms relative to the bind-pose.
Also, think about it realistically - if your model comes at 24 FPS, then multiplying it by 2 will be enough for any human. 100 keyframes per second will give you smooth slow motion, which you very likely won't be needing.
Where does the magic number 24 (or 48) come from? ;P
What dpadam450 means is that the only genre in which you will realistically encounter a large number of models that need to be animated individually is an RTS games. In a general case you'll have 10 models tops running around at one time, which is a beeze to animate on the CPU.
[/quote]Modern FPS games often have ~50 characters on-screen at once wink.png
I'm doing a sports game at the moment with 30 characters, each with 60 bones, and who all have multiple different animation sources blended together unpredictably and IK applied on top -- the whole skeletal update part is still fairly cheap and only takes up a few milliseconds.

I'd personally just implement it in a way that is easily understood first (especially if I was fairly new to skinned animation, which admittedly, I am) and work on writing a more optimal version after I got the basic one working if it actually turns out to be performing badly.
Advertisement
@Irreversible
To be honest I probably won't be implementing any bullet-time effects, but I will have changeable game speed, which could drop the game speed to a very low value. I still think doing the interpolation in real-time is more accurate, since even if the animation speed matches the game FPS it would still be more accurate to do the interpolation for the exact time. Maybe it really is an unnoticeable difference in 99.999% of all cases. I might not be able to afford the additional cost of lots of slerps each frame even with multithreaded joint interpolation, so getting rid of it and just keeping the precomputed bone matrices in GPU memory might be the best choice anyway. Memory is something I can afford to use more of, so precomputing to about 60-120 frames per second should give enough smoothness in all possible cases. Now I know what the animation quality setting does in games... >_>

I am actually making a real-time strategy game, so I might be having about 100 units on the screen at the same time.


@Hodgman
I've read up quite a lot on GPU skinning and I have more than enough experience with shaders to implement this. Storing the joint translation and orientation in a matrix is probably the best idea as it eliminates the weight positions that would have to be stored per vertex otherwise. I'm loading MD5 meshes and animations, so the maximum number of weights per vertex that format supports is 4, so I'll just stick with that. It also doesn't support joint scales, so that simplifies it further. If using MD5 is a bad idea for some reason, please stop me now!!!
24 frames per second comes from the specific model I'm animating.


In other news, I just managed to get my software skinning working, so Bob is (happily?) waving his lantern around. FPS dropped from 83 FPS to 14 due to the skinning being done on the CPU (well, with 1000 instances though xD). Next I'll move the skinning to a vertex shader but keep joint interpolation on the CPU which is was the standard approach, right? Lastly I'll try a pure GPU solution with precomputed joints stored in a texture.

EDIT: My software implementation is obviously bottlenecked by the skinning. Skinning takes about 65% of the frame time at the moment, possibly a lot more if you count methods that are shared with other parts of the game.

Regarding the static vertex data, most of the implementations I've seen use a UByte*4 for the associated bone indicies and a UByte*4 for the weights for those indices. This limits each vertex to being associated with only 4 bones, and if a vertex is associated with less bones, then it also performs same math as if it were associated with 4 but it uses weights of 0.0 for the extra bones.
I've usually seen the dynamic/animated bone data represented as a 4x3 (or 3x4) matrix containing rotation/scale/translation transforms relative to the bind-pose.


Incidentally, I don't have this working yet, but I'm packing indexes with a ratio of 3:1 into float vectors while maintaining 8-bit precision (I haven't done the actual math as to what the maximum practical precision is, but the packing is the same as RGB2Float), limiting the model to 255 bones, which should be enough in even the most fringe cases, but it enables more concurrently influencing bones without increasing storage. As for packing weights into a byte values, that results in a precision of 0.0039. I'm actually fairly curious as to whether this is enough (if it is, I'll definitely want to pack my weights as well). Incidentally, I'm limiting myself to 4 concurrent data streams since I'm using transform feedback to do the skinning, which supports 4 bones at most for now as the largest vector stream that can be passed to TF is vec4, which limits the number of weights that can be blended.


[quote name='irreversible' timestamp='1327056930' post='4904544']Also, think about it realistically - if your model comes at 24 FPS, then multiplying it by 2 will be enough for any human. 100 keyframes per second will give you smooth slow motion, which you very likely won't be needing.
Where does the magic number 24 (or 48) come from? ;P
[/quote]

Oh, that's from the Bob model discussed above smile.png
Modern FPS games often have ~50 characters on-screen at once wink.png
I'm doing a sports game at the moment with 30 characters, each with 60 bones, and who all have multiple different animation sources blended together unpredictably and IK applied on top -- the whole skeletal update part is still fairly cheap and only takes up a few milliseconds.
[/quote]

A fair point, but it really boils down to what the game is about. I'm personally targeting a non-kinematic solution (which, admittedly, begs the question why would one need skeletal animation anyway?).
What I'm saying is 2 things, which someone on gamedev that is a moderater apparently doesn't understand so they rate down.

Don't over-optimize something that doesn't need it. Whatever method you do with probably be fine, unless you are really drawing a massive amount or even moderate amout of animated stuff. Unless you have an artist to make 50 models for an FPS (which I find way too high a statistic anyway), then don't worry to much about a bottleneck that may or may not exist for your specific game. Most cases just on the cpu take all the bones between last frame and the next keyframe, blend those bones into new ones and send them down to the GPU.

NBA2K, Madden, Maneater, Killing Floor, Sims http://www.pawlowskipinball.com/pinballeternal

I'm personally targeting a non-kinematic solution (which, admittedly, begs the question why would one need skeletal animation anyway?).[/quote]
Kinematics is moving, so your probably thinking of inverse kinematics, or inverse of momement. If you have an animated character, it has bones created from art in order to make frames of animation. Any 3d object has a skeleton.

NBA2K, Madden, Maneater, Killing Floor, Sims http://www.pawlowskipinball.com/pinballeternal

Thanks for the responses, everyone! I got some really interesting responses, so I'll probably be busy for a while now. =P

This topic is closed to new replies.

Advertisement