skinning... GPU or CPU ?

Started by
12 comments, last by fsm 16 years, 10 months ago
Hi, i'm currently working on skinning... I wanted an 'all on the gpu' thing, but it looks like there are some problems. 1. multipass means multitransform the same vertex (on the other hand, on the cpu, that means storing vertexs position once per instance of an object, so that use a lot of memory...) 2. number of bones problem : on the gpu, it seems that i'm limited by the number of register to store my uniforms. my skinning shader uses just a mat4x4 table uniform. The problem is that this table cannot be larger that about 64 elements (at least in GLSL, under linux but that must be the same for others systems), but recent models (doom3 ones for examples) have about 80-100 bones... this is my main problem. I'm not really fond of sending a new vertex array from ram to vram for each object for each frame (I did that in my previous engine, but it was a fixed function one...), especially if I go deferred shading later, but i'd prefer not to be limited at 64 bones (128 would be better) well, how is this problem treated in recent engines ? is skinning still done on the cpu or is gpu skinning the norm. IIRC quake 4 now can do gpu skinning (doom3 did cpu skinning), how do they do that with so much bones ? EDIT: I thought of an alternative, to gain memory on the gpu, I could store my bones transform not as a mat4x4 but as a rotation mat3x3 + a translation vector. that would free 4 float per bones so it might permit (not sure on how my card handle that, but I do not think uniforms are just a big buffer) to gain 64*4/12 = 21 new bones (in the best case). Is that a good idea, will mat3 + vec3 -> mat4 conversion be efficient ? Have a nice day. Maxime. [Edited by - Mawww on June 21, 2007 6:30:02 PM]
Tchou kanaky ! tchou !
Advertisement
Skinning with lots of bones can be quite efficient on CPU. To skin on GPU you need to create the inverse bind pose matrices - expensive, but not really worth thinking about how expensive it is.

I know that a lot of companies who develop for 360 skin on CPU, for that platform that is - use all GPU power for eye candy, use an extra core for skinning.

Anyways, skinning on CPU can be quite optimal if you use quaternions directly...a teeny bit less using matrices...but its generally, on most PCs, a bit better to do it on GPU, why waste the fact that you can dump the verts on the VRAM, and transform them in a vertex shader, rather than transform them on CPU and send em to the graphics card every frame? All depends on the target machine/platform though really [smile]

If your doing GPU skinning you'll need to split up the mesh to support a wider range of hardware, someone posted up a pseudo code algorithm a few months back to do this, although its not hard to work out - basically you split up the mesh into sections which use 12 bones each, then you have low end support. I do GPU skinning in my engine, but I have a 36 bone cap in one pass, going to split it up at some point.
Adventures of a Pro & Hobby Games Programmer - http://neilo-gd.blogspot.com/Twitter - http://twitter.com/neilogd
You should have around 256 uniforms on a modern card. You only need a 4x3 matrix for skinning so you can send that up as 3 uniforms per bone. So at best you can do 85 bones per pass. That's actually a little unrealistic as you need to have other things in your uniforms, but 70-75 is not unreasonable.

If you need to render a model with more bones then you can split the mesh up into chunks, each of which uses a subset of the bones. This is pretty easy since a given chunk of the mesh is almost certainly only going to use a proportional number of bones (e.g. if you slice the mesh in half, each half will use roughly half the bones). This allows you to render each chunk in a separate pass without running out of uniforms. Given enough splits you can render meshes with arbitrarily complex skeletons this way.
Quote:Original post by Richy2k
Skinning with lots of bones can be quite efficient on CPU. To skin on GPU you need to create the inverse bind pose matrices - expensive, but not really worth thinking about how expensive it is.

You only need to do that once when you load the skeleton, so pretty much a non-issue.

Quote:Original post by Richy2k
I know that a lot of companies who develop for 360 skin on CPU, for that platform that is - use all GPU power for eye candy, use an extra core for skinning.

Really, who? I've not worked on 360, but on xbox I always did skinning on the GPU, it's a lot faster.

Quote:Original post by Richy2kAnyways, skinning on CPU can be quite optimal if you use quaternions directly...a teeny bit less using matrices...but its generally, on most PCs, a bit better to do it on GPU, why waste the fact that you can dump the verts on the VRAM, and transform them in a vertex shader, rather than transform them on CPU and send em to the graphics card every frame? All depends on the target machine/platform though really [smile]

Most games are CPU bound. You want to get as much stuff off the CPU as possible. Almost no games are vertex bound - you almost always get CPU bound or fragment bound well before that. Putting a bit more load on the vertex shader for skinning is thus a good idea.

Quote:Original post by Richy2kIf your doing GPU skinning you'll need to split up the mesh to support a wider range of hardware, someone posted up a pseudo code algorithm a few months back to do this, although its not hard to work out - basically you split up the mesh into sections which use 12 bones each, then you have low end support. I do GPU skinning in my engine, but I have a 36 bone cap in one pass, going to split it up at some point.


Last time I had to target multiple hardware configs I did it dynamically (i.e. split on load) depending on the capabilities of the machine being used. So high end machines would render around 70 per pass, while low end would be closer to 20.
I don't know what your target platform is, but I generally suggest doing it on the GPU. Note that if you're targeting DX10, you can avoid all of your problems entirely. In particular:

1) You can StreamOut your transformed positions to a vertex buffer if you need to do multipass rendering. (Alternatively, deferred shading also avoids the problem).

2) You can read large textures and/or constant buffers efficiently in the vertex shader.

3) Instancing is a fundamental part of the API and very efficient.

With respect to transformations, you can certainly use a 4x3 matrix instead of a 4x4 one to save space and (a bit of) computation. You can also use quaternions to save even more space and computation, and they work just as well on the GPU. You may want to look into "Dual Quaternion Skinning" from I3D 2007 - it seems like a very efficient way to represent and compute rigid body transformations (you lose the "scale" parameter on your transforms, but you also lose candy-wrapper nonsense and so forth so you probably don't even need it).
Quote:Original post by Jerax
Quote:Original post by Richy2k
I know that a lot of companies who develop for 360 skin on CPU, for that platform that is - use all GPU power for eye candy, use an extra core for skinning.

Really, who? I've not worked on 360, but on xbox I always did skinning on the GPU, it's a lot faster.

Quote:Original post by Richy2kAnyways, skinning on CPU can be quite optimal if you use quaternions directly...a teeny bit less using matrices...but its generally, on most PCs, a bit better to do it on GPU, why waste the fact that you can dump the verts on the VRAM, and transform them in a vertex shader, rather than transform them on CPU and send em to the graphics card every frame? All depends on the target machine/platform though really [smile]

Most games are CPU bound. You want to get as much stuff off the CPU as possible. Almost no games are vertex bound - you almost always get CPU bound or fragment bound well before that. Putting a bit more load on the vertex shader for skinning is thus a good idea.



I'm not a 360 programmer either, but there are widely known hardware features that may explain the possibility that skinning on the CPU makes sense for the 360.

Firstly, because the 360 uses a unified shader model any pipes dedicated to vertex shading are essentially pipes that are subtracted from the overall pixel-shading capacity of the machine. If you're pixel-shader bounded, and the CPU has some idle cycles, skinning on the CPU makes sense. On the original Xbox (and basically all non-direct3D 10 hardware) vertex and pixel pipes are distinct, whether you're using your vertex pipes at 50 or at 100 percent makes no performance difference, assuming the bottleneck is elsewhere, so you may as well load it to the max. On the 360, the vertex and pixel-processing power are inversely related.

Another possible reason why CPU skinning might make sense on the 360 is that it has specific hardware features that can aid this kind of setup. For example, the 360's CPU is able to stream data directly to the GPU, even bypassing the CPU cache. This may eliminate some of the negatives typically associated with CPU skinning. Also account for the fact that there are PPC cores containing 3 Altivec units supporting dot-products and having 128 VMX registers per hardware thread (6).

Another factor is that, on the 360, developers may just happen to have spare CPU cycles or even entire threads/cores. Until games are more pervasively threaded, developers will look for processes that they can just wrap up and toss onto a thread to fill the available resources.

throw table_exception("(? ???)? ? ???");

Skinning is done *much* faster on the GPU than it is on the CPU, even if you write highly optimised vector code.

However, if you are doing multiple passes on a single mesh, you're doing lots of post-process effects that will eat up your GPU time, or you have other cores to spare (*cough 360/ps3 cough*) then doing the skinning on the CPU can be a gain. It's all about experimentation and what your engine is currently doing.
yeah the PS3 I'm not certain...

On xbox360 titles I've worked on to be honest we've done the GPU Skinning and seriously under-utilized the multicores so our engine, like many, were not optimal. our models didnt have a lot of bones but if they had too many we'd fallback to CPU Skinning.

our engine is multipass as well

I think thats how my home project works as well also (fallback to CPU). my goal is to not go over 70 bone matrices though. I did code the ability to breakup a mesh into chunks but didnt maintain that code so now this is pretty much not an option for me. my engine is multithreaded so now to recall I think that part is broken anyway (the CPU fallback option)
my advice from the last 3or4 years hasnt changed, go with cpu skinning.
theres no limitations like u have with gpu
dont get fooled by the supposedly greater gpu performance cause it comes with many strings attached
Quote:Original post by ravyne2001
Firstly, because the 360 uses a unified shader model any pipes dedicated to vertex shading are essentially pipes that are subtracted from the overall pixel-shading capacity of the machine. If you're pixel-shader bounded, and the CPU has some idle cycles, skinning on the CPU makes sense.


I'm fairly confident that it doesn't work quite the way you suggest. (Edit: It kind of does... but I guess what I mean is that lightening the load on the vertex pipe doesn't automatically give you better pixel performance. I'll admit that I haven't tried taking skinning off to see if it would help... but it's not a guarantee by any means.)

I'd also like to know who's skinning on CPU on the 360. Skinning is fast and easy on GPU. I suppose CPU skinning is a viable option on 360 because of the extra cores, but it comes at a memory hit, and you probably have to double buffer your VBs to prevent stalls. I know on our last 360 title we didn't have the memory left over for that, regardless of how much free CPU time we had floating around.

We have to do CPU skinning on other platforms and it becomes a huge optimization pain. It's just plain easier to do it on the GPU if you're working with fixed hardware. I can see it sucking on PC because of having to batch bone uploads, though.

FWIW on 360 we skin 120+ bone skeletons on the GPU without batching. I think a few people are doing something similar.

This topic is closed to new replies.

Advertisement