Jump to content
  • Advertisement
Sign in to follow this  
benj

about hardware skinning performance

This topic is 4833 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

hi there i'm doing hardware smooth skinning (directx hlsl, 4 bones per vertex) i know gpu transform are fast, but.... can software smooth skinning performs better? in hardware i'm computing (number of triangles * 3) operations however in software i can only compute skinning once per vertex (i'm now working on a model with 1200 faces and 600 vertices) (ie 1200*3 = 3600 operation in hardware, 600 in software) (what i call "operation" is the smooth skinning algo) further more , each time i send my model to the gpu it needs to compute it again (i may need to send it more than once per frame, for shadow for example) I can also, in software, have just one influence on some vertex,3 on other ,... Did you guys made some performance comparison? do i need to switch back to software maybe with SIMD? i'd like to hear your opinion on that... thanks you! ps: i know that with indexed vertices, some vertices do not have to be calculed per triangle, so 3*1200 is the worst case scenario

Share this post


Link to post
Share on other sites
Advertisement
It really depends on the data you're representing I guess.

Most of todays games aren't transform limited, so doing extra work in the vertex shader shouldn't be a problem (if you have relatively complex pixel programs and high pixel/vertex ratio).

If you're rendering large groups of characters in the same pose, say for a big army or something, then you might want to do the transform on the CPU, and batch together all the models in the same pose (either through hardware instancing, or building a new vertex buffer).

On next gen hardware, you will start seeing unified shader architectures with dynamic load balancing, so the pixel vs vertex load wouldn't be a consideration any longer.

Benchmark :)
-I know, you probably don't wan't to implement an algorithm just to find that it's not the one you'd want to use... post more problem specific information, and I'm sure you'll get some helpful responses.

Share this post


Link to post
Share on other sites
Hardware skinning multiple times is faster than software skinning (for now) for several reasons. First, if you're processing in strips then you're only skinning one vertex per triangle, not 3. And if you're making nice use of the Post-Transform-Cache, you're potentially only skinning 0.5 verts / triangle.

Additionally, on the x86, SIMD is not very good. SSE has many pitfalls, and even if you code your skinning routine very carefully it won't be that fast. Now you have dynamic geometry which needs to be uploaded every character instead of just the skeleton.

I said '(for now)' before, because on next-gen consoles all these problems go away. They have multiple cores which can be dedicated to software-skinning with a much faster SIMD than SSE. They have monstrous bandwidth to the GPU, so the upload time is not a factor, and in the case of multi-pass rendering (as you mentioned) the character only needs to be skinned once.

In the environment of 'many-CPU, one GPU' the GPU is likely the bottleneck.

Share this post


Link to post
Share on other sites
IMHO, most PCs running recent games are more GPU bound than CPU bound. of course, if your vertex program unit is free, USE THAT!

Share this post


Link to post
Share on other sites
Quote:
Original post by Code-R
IMHO, most PCs running recent games are more GPU bound than CPU bound. of course, if your vertex program unit is free, USE THAT!

Unfortunately your opinion is utterly wrong. The vast majority of PC games (especially recent games) are CPU bound.

Share this post


Link to post
Share on other sites
Well, the way I see it is that you have two major advantages with GPU skinning:

1) It frees up the CPU processing power otherwise used for skinning. This is the obvious one, but also the most important. In many cases your GPU is going to be spending more time writing fragments than transforming verticies, so GPU skinning can be practically "free" under the right circumstances. And anything that frees up your CPU for game logic/physics/etc. is a good thing.

2) When skinning on the GPU you can eliminate a LOT of data transfer. If you're using D3D, this means only calling expensive Lock/Unlock procedures once per mesh. Since the data on the card doesn't have to change, all that bandwidth is freed up. That can result in a big speedup in and of itself, especially if you have multiple objects using the same underlying mesh.

I think that overall the good outweighs the bad when it comes to GPU skinning, but there are several situations where it's not feasible/desirable, such as if your game is heavily reliant on multi-pass algorithims, since the mesh would need to be re-skinned for every pass. (Of course, if we can get a good render-to-vertex-buffer feature in newer hardware it becomes a moot point.) Also, GPU skinning can inflate the number of batches quite a bit if you have a lot of objects with complex underlying skeletons. Most skinning shaders only handle 28 or so bones at once, so you end up sending your mesh through in 3-4 chunks. This isn't AS much of a problem in OpenGL, but it's still not desirable in extreme situations.

If we knew more about your intended use (RTS, Fighting Game, FPS, RPG, etc...) we could give more specific suggestions, but the real answer is make an educated guess based on what you need and then benchmark benchmark benchmark!!!

Share this post


Link to post
Share on other sites
Another thing to consider is that doing the skinning on the CPU allows you to render your character with fewer batches than if you do the skinning on the GPU. On the GPU, the number of bones per batch is limited by constant register space. On the CPU, this is not a limit. Reducing the number of batches per frame low is pretty key to getting good performance.

I wouldn't bother use hardware skinning on vs_1_1 hardware, where batch cost is high, and vertex shader units are slow, and you are limited to 96 shader constants.

On vs_2_0 hardware, I would only expect the hardware path to outperform the SSE software path only by about 10% or so.. even on a CPU bound application. Its nowhere near the dramatic speedup you'd think you'd get.

If you want a real-world test, load up WoW, run to some busy place (like Ironforge), and toggle Vertex Animation Shaders in the video options (restart WoW for change to take effect). This will toggle between using vs_2_0 path and the SSE path. Ctrl-R will display your framerate.

xyzzy

Share this post


Link to post
Share on other sites
This is one of those "it depends" situations. Different scenarios call for different solutions:

- Slow vs_1_1 shaders : you can reduce the number of influencing bones, if you can live with the quality loss.

- Too few constants : if you can afford coding both paths, you can always fall back to software if you cant fit your skeleton into the given hw's constant pool. The rest of the characters can use the accelerated path.

Actually, accelerated is not the right word. I'd say, it's parallelized. The most important thing (imho of course) when you're designing/profiling your rendering techniques, is to remember that even if you're seeing a 10% advantage using the cpu instead of the gpu, that "10% slower" thing runs in _parallel_ with the cpu! The overlapping time can be used to calculate something else, which will be very important once you become cpu-limited (which is _VERY_ common when doing anything else than "draw object").

Let me say again, this does _not_ always apply, but it's good practice to "think parallel" and not be obsessed by raw numbers extracted from one-object profile sessions.

Share this post


Link to post
Share on other sites
Almost all current games are CPU-bound before you crank up the AA/Aniso level. Hardware vendors have repeatedly stated that their driver teams and developer relations engineers have yet to see a vertex shader bound game. I don't think there is a good reason to do *anything* on the CPU if you can do it in the vertex shader.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!