#1 Members - Reputation: 136
Posted 17 July 2012 - 01:29 PM
i would like to hear from anyone who have ideas on how to implement gnu skinning with more than 4 bones.
i've been working with 4 bones per vertex but now i would like to know if i could remove this limitation in someway?
thanks in advance
#2 Members - Reputation: 345
Posted 17 July 2012 - 02:45 PM
As a side note, 4 bones per vertex is enough in most games.
Edited by Waaayoff, 17 July 2012 - 02:49 PM.
#5 Members - Reputation: 136
Posted 19 July 2012 - 02:37 AM
The hardware is dx10+. currently I'm sending 2 vec4 along with the vertex layout, e.g. TEXTURE2, TEXTURE3 that gives me 8 bones influences plus 2 others for indices.
Would there be a better way to do this? in a way i could save the 4 texture units?
#6 Members - Reputation: 148
Posted 20 July 2012 - 02:28 AM
As you know there're many extra work to do with vectex data..
And it's true that there's no need for 4+ bones ....
#7 Crossbones+ - Reputation: 1432
Posted 20 July 2012 - 03:48 AM
You should be able to pack as many bones + weights as you want in your vertex. If you're on relatively modern hardware (DX10-class or higher) you may want to branch on the weight being > 0 before fetching the bone matrix as an optimization.
eh?, i was under the impression that overall it's cheaper to do the mathematics, then to do such branching? granted modern hardware is more capable of dealing with such branching, i was simply under the impression that it's overall cheaper for the gpu to do the matrix math, then to do any branching on dynamic data sets.
Edited by slicer4ever, 20 July 2012 - 03:48 AM.
#8 Moderators - Reputation: 3962
Posted 20 July 2012 - 04:28 AM
eh?, i was under the impression that overall it's cheaper to do the mathematics, then to do such branching? granted modern hardware is more capable of dealing with such branching, i was simply under the impression that it's overall cheaper for the gpu to do the matrix math, then to do any branching on dynamic data sets.
This depends on the granularity of the work.
In this case the choices are 'do work' or 'skip'; if all the threads in an execution group can skip the work then you get a net win as doing no work is faster than looping over 4 zero influence bones. If only some skip the work then you are, more or less, no worse off as the 'else' branch is skipping work.
The problems can happen when you have an 'if' and 'else' block which both require work to be done; if all your threads can take one path or the other then you'll get a win as you'll only do the work you need to do. However if you have a situation where say 50% of your threads go down the 'if' path and 50% down the 'else' path you'll end up doing both chunks of work.
So when dealing with branching you have to look at how the branches are going to be used and the chance of having to take both paths. Pixel processing tends to be the biggest problem here, as vertices tend to all take the same path across a model.
For example on a console game we had roads being rendered with a texture which had an alpha'd edge. This could lead to large segments of polygons not contributing to the final output but, originally, having to do all the work. By doing a simple 'if' check on the alpha early in the shader we could elimate most of that work load as the pixels broke down into three groups;
- thread groups which did all the maths
- thread groups which could early out
- thread groups which had some threads doing all the work
In this case most of the thread groups were in group 1 and 2 so the border case didn't effect the over all runtime of the shader granting us a net win for performance.






