Skeletal animation with vertex shader

Started by
2 comments, last by wolfscaptain 12 years, 2 months ago
So, I finally want to change my slow (relatively) software skeletal animation to reside in a vertex shader.

I created a 2D Wx1 texture containing all the bone matrices (where W is 16 floats per matrix), had to use a 2D texture because I am using WebGL which doesn't support 1D textures.

Each vertex comes with a vec4 of bone indices that point to the appropriate matrix. Currently I set -1 as a "no matrix", since bones might have anything between 1 and 4 bones effecting them.

I am now wondering what would be a decent way to use this in the shader.
As far as I can see, I need to send the shader also the size of the texture in a separate uniform, since a texture fetch takes values in the range [0,1] and I want values in the range [0,number_of_bones].
Once I get to the index, I need to fetch 4 4D vectors to get the matrix columns, and with those construct the actual bone matrix.
To get consecutive vectors, I need to advance the index with the size of one 4D vector converted to the range of [0,1], so 1/(4 * number_of_bones).
Finally, each result of a vertex multiplied by the bone matrix is added to a final position, which is later on divided by the number of bones actually effecting this vertex (I don't know if this is normal, but it's not a format I made for myself so I have to adhere to this).

The following code is a pseudo GLSL code for the above:
float size_of_column = 1 / (4 * number_of_bones); // the size of every vec4 scaled to the range [0,1]
vec3 final_position = vec3(0.0, 0.0, 0.0);
int indices_count = 4;

for (i = 0; i < 4; i++) {
if (bone_index != -1) {
float index = bone_index / number_of_bones;
vec4 column1 = texture2D(bone_texture, index);
vec4 column2 = texture2D(bone_texture, index + size_of_column);
vec4 column3 = texture2D(bone_texture, index + 2 * size_of_column);
vec4 column4 = texture2D(bone_texture, index + 3 * size_of_column);
mat4 matrix = mat4(column1, column2, column3, column4);
final_position += matrix * vec4(a_position, 1); // a_position is the original position attribute
} else {
indices_count -= 1;
}
}

final_position = final_position / indices_count;


Now, I am not even sure if this will work. It sounds like precision issues to me before I even begin writing the real shader, with the texture fetches.

Apart from that, the whole thing looks pretty cumbersome, and a lot more complicated than I would have expected.
I also believe loops are not really supported well on GPUs, no?

On the whole, I really would like to know if this can work (like I said, sounds like precision issues to me, getting values from a range of [0,1] really isn't ideal for arbitrary data), and whether it will actually be faster than doing the same on the CPU. All those texture fetches seem a bit wild (and also creating the texture itself, which is usually between 2 to 4 MB, every frame).

By the way, before you go and tell me to use uniform matrices instead of a texture - I have 60+ bones, and I don't believe older GPUs support that many uniform components. I would like this to be compatible with any WebGL (aka ES2) compatible GPU.

Thanks!
Advertisement
Unfortunately I don't know if webGL supports this, but in regular opengl you could just use texelFetch instead of texture2D if you want to just pull the specific x-y coordinate texel out of an image, instead of the typical 0-1 interpolation.

Just be careful if you want to use integers as attributes instead of floats, theres a whole separate set of commands for that (glVertexAttribIPointer, etc).
[size=2]My Projects:
[size=2]Portfolio Map for Android - Free Visual Portfolio Tracker
[size=2]Electron Flux for Android - Free Puzzle/Logic Game
An older GPU won't support vertex textures either, so I don't believe that the argument against uniforms is sound.

For uniform counts you're generally looking at figures of 96, 256, or something higher than 256. Each matrix will need 4 uniforms, and you'll need to keep some spare for your own use, so you can do the calculations from there.

In general the 96 count is very very old hardware only - we're talking close enough to last century here (GL1.5/DX8 class). On any GL2/DX9 or better class hardware you're guaranteed at least 256, and if the hardware is 4/5/6 years old or less you'll have more.

One option is to use uniforms where the number of bones falls within your limit, but otherwise drop back to your old software animation. That does mean that the slower case gets the slower code path, which is generally not what you want, but on the other hand you'll be no worse off than you were before.

Regarding your loop, worst case is that the compiler is just going to unroll it. That's not going to be too bad. The dynamic branching inside the loop is what you really need to be worried about, as this may cause trouble on the older hardware you seem to be aiming at.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

I ended up just going the matrix-array way, since texelFetch is not accessible in WebGL. Apparently my own card supports 254 (which is pretty weird, shouldn't it be a power of two, 256?) vectors on the vertex shader, so it's enough for most if not all models I have.

As you suggested, I'll check if the target supports enough vectors for the model (this is a model viewer, so there's only one model, which makes this extremely easy), and according to that start calling either the software or hardware code.

Now all there's left is to check how much faster this is (uploading 30+ 4x4 matrices doesn't sound too optimal to me).

Thanks guys! smile.png

This topic is closed to new replies.

Advertisement