I created a 2D Wx1 texture containing all the bone matrices (where W is 16 floats per matrix), had to use a 2D texture because I am using WebGL which doesn't support 1D textures.
Each vertex comes with a vec4 of bone indices that point to the appropriate matrix. Currently I set -1 as a "no matrix", since bones might have anything between 1 and 4 bones effecting them.
I am now wondering what would be a decent way to use this in the shader.
As far as I can see, I need to send the shader also the size of the texture in a separate uniform, since a texture fetch takes values in the range [0,1] and I want values in the range [0,number_of_bones].
Once I get to the index, I need to fetch 4 4D vectors to get the matrix columns, and with those construct the actual bone matrix.
To get consecutive vectors, I need to advance the index with the size of one 4D vector converted to the range of [0,1], so 1/(4 * number_of_bones).
Finally, each result of a vertex multiplied by the bone matrix is added to a final position, which is later on divided by the number of bones actually effecting this vertex (I don't know if this is normal, but it's not a format I made for myself so I have to adhere to this).
The following code is a pseudo GLSL code for the above:
float size_of_column = 1 / (4 * number_of_bones); // the size of every vec4 scaled to the range [0,1]
vec3 final_position = vec3(0.0, 0.0, 0.0);
int indices_count = 4;
for (i = 0; i < 4; i++) {
if (bone_index != -1) {
float index = bone_index / number_of_bones;
vec4 column1 = texture2D(bone_texture, index);
vec4 column2 = texture2D(bone_texture, index + size_of_column);
vec4 column3 = texture2D(bone_texture, index + 2 * size_of_column);
vec4 column4 = texture2D(bone_texture, index + 3 * size_of_column);
mat4 matrix = mat4(column1, column2, column3, column4);
final_position += matrix * vec4(a_position, 1); // a_position is the original position attribute
} else {
indices_count -= 1;
}
}
final_position = final_position / indices_count;
Now, I am not even sure if this will work. It sounds like precision issues to me before I even begin writing the real shader, with the texture fetches.
Apart from that, the whole thing looks pretty cumbersome, and a lot more complicated than I would have expected.
I also believe loops are not really supported well on GPUs, no?
On the whole, I really would like to know if this can work (like I said, sounds like precision issues to me, getting values from a range of [0,1] really isn't ideal for arbitrary data), and whether it will actually be faster than doing the same on the CPU. All those texture fetches seem a bit wild (and also creating the texture itself, which is usually between 2 to 4 MB, every frame).
By the way, before you go and tell me to use uniform matrices instead of a texture - I have 60+ bones, and I don't believe older GPUs support that many uniform components. I would like this to be compatible with any WebGL (aka ES2) compatible GPU.
Thanks!