gpu skinning +4 bones

Started by
6 comments, last by _the_phantom_ 11 years, 9 months ago
Hello,
i would like to hear from anyone who have ideas on how to implement gnu skinning with more than 4 bones.
i've been working with 4 bones per vertex but now i would like to know if i could remove this limitation in someway?

thanks in advance
Advertisement
Just pass more than 4 indices/weights per vertex to the shader?

As a side note, 4 bones per vertex is enough in most games.
"Spending your life waiting for the messiah to come save the world is like waiting around for the straight piece to come in Tetris...even if it comes, by that time you've accumulated a mountain of shit so high that you're fucked no matter what you do. "
Hey Wayoff,
After posting it came to mind that i could send another pack of 4 components. I will give this a try and see how this works out!
thanks for the reply!
You should be able to pack as many bones + weights as you want in your vertex. If you're on relatively modern hardware (DX10-class or higher) you may want to branch on the weight being > 0 before fetching the bone matrix as an optimization.
Hey MJP,
The hardware is dx10+. currently I'm sending 2 vec4 along with the vertex layout, e.g. TEXTURE2, TEXTURE3 that gives me 8 bones influences plus 2 others for indices.

Would there be a better way to do this? in a way i could save the 4 texture units?
May be you should send the bone matrix and other datas to GPU as a texture and then blend them with shader ...(Fragment Shader) ..
As you know there're many extra work to do with vectex data..

And it's true that there's no need for 4+ bones ....

You should be able to pack as many bones + weights as you want in your vertex. If you're on relatively modern hardware (DX10-class or higher) you may want to branch on the weight being > 0 before fetching the bone matrix as an optimization.


eh?, i was under the impression that overall it's cheaper to do the mathematics, then to do such branching? granted modern hardware is more capable of dealing with such branching, i was simply under the impression that it's overall cheaper for the gpu to do the matrix math, then to do any branching on dynamic data sets.
Check out https://www.facebook.com/LiquidGames for some great games made by me on the Playstation Mobile market.

eh?, i was under the impression that overall it's cheaper to do the mathematics, then to do such branching? granted modern hardware is more capable of dealing with such branching, i was simply under the impression that it's overall cheaper for the gpu to do the matrix math, then to do any branching on dynamic data sets.


This depends on the granularity of the work.

In this case the choices are 'do work' or 'skip'; if all the threads in an execution group can skip the work then you get a net win as doing no work is faster than looping over 4 zero influence bones. If only some skip the work then you are, more or less, no worse off as the 'else' branch is skipping work.

The problems can happen when you have an 'if' and 'else' block which both require work to be done; if all your threads can take one path or the other then you'll get a win as you'll only do the work you need to do. However if you have a situation where say 50% of your threads go down the 'if' path and 50% down the 'else' path you'll end up doing both chunks of work.

So when dealing with branching you have to look at how the branches are going to be used and the chance of having to take both paths. Pixel processing tends to be the biggest problem here, as vertices tend to all take the same path across a model.

For example on a console game we had roads being rendered with a texture which had an alpha'd edge. This could lead to large segments of polygons not contributing to the final output but, originally, having to do all the work. By doing a simple 'if' check on the alpha early in the shader we could elimate most of that work load as the pixels broke down into three groups;
- thread groups which did all the maths
- thread groups which could early out
- thread groups which had some threads doing all the work

In this case most of the thread groups were in group 1 and 2 so the border case didn't effect the over all runtime of the shader granting us a net win for performance.

This topic is closed to new replies.

Advertisement