• Create Account

### #Actualtheagentd

Posted 19 January 2012 - 03:50 AM

Hello. I've been trying to do skinning, and so far I think I've kind of figured out the math behind it. I've successfully loaded an MD5 mesh and I'm working on implementing CPU skinning for a small animation following this guide: http://3dgep.com/?p=1053. I think I can get software skinning working quite easily but I definitely want to do the skinning in a vertex shader. I found another tutorial on the same website that covers this: http://3dgep.com/?p=1356. The "problem" is that it seems to do the bone interpolation on the CPU and then submit the data each frame to the GPU. I just transforming 100 bones shouldn't be that CPU heavy, but I might potentially have a thousand instances of the same mesh but in different animations. Therefore I also want to use instancing to reduce the CPU. Since my game is an RTS game the GPU is severely underused even though I offloaded fog of war rendering, so trading GPU cycles for CPU cycles is a good thing in my case.

I could just multi-thread the bone interpolation for an almost linear increase in performance, but it I still thought that it should be possible to off-load almost everything onto the GPU. That's when I stumbled over this white paper: http://developer.dow...gWhitePaper.pdf. It seems to do exactly what I want by storing bone matrices in a texture, which was something I had thought of doing. However, the implementation in the white-paper does not seem to have any kind of bone interpolation, and simply rounds the frame to the closest frame (though this isn't written anywhere). I'm 99.9% sure they don't re-upload the bone matrices each frame since they seem to keep the bone data for each animation and frame, not for each individual instance. Losing interpolation seems like a huge step backwards, so I would definitely not implement skinning if that turned out to be the cost.

I figured I could just upload even "rawer" bone data to my animation texture, meaning I'd keep a 3D vector and a quaternion per bone instead of a matrix and then do the interpolation between the two frames in the vertex shader. The amount of data sampled would only increase by about 33%:

1 matrix per weight = RGBA 32-bit float x 3
2 vectors and 2 quaternions = RGBA 32-bit float x 4

I would also have to upload additional static data to the weights (a position for each weight). The problem is the additional logic needed to transform each vertex since the interpolation would have to be redone for each vertex. I think this additional cost will be almost unnoticeable though since the vertex shader should be bandwidth / texture limited anyway. If there happens to be built-in functions to do slerp (I've found mix(...), but I'm not sure if it's the right one) I think the additional logic cost would be negligible.

In short, I'd port this exact function to a GLSL vertex shader:
  for (int i = 0; i < m.vertices.size(); i++) {
Vertex v = m.vertices.get(i);
float x = 0, y = 0, z = 0;
for (int k = 0; k < v.weightCount; k++) {
Weight w = m.weights.get(v.startWeight + k); //v.startWeight = index in list of weights
Joint j = bindPoseJoints.get(w.joint); //Joint contains position and orientation. I'd be using the animation joints, not the bind pose joints of course.
rot(j.orientation, w.position, temp); //Quaternion rotation of weight position, temp is a temporary Vec3.
temp.scale(w.bias);
x += temp.x;
y += temp.y;
z += temp.z;
}
vertexData[mesh].putFloat(x).putFloat(y).putFloat(z); //Load data into an array to send it to OpenGL
}


I'm pretty much a skinning n00b, but these are my thoughts on it. The main problem is the amount of (static) data needed per vertex (4 x vec3 per vertex for the weight position), but if the cost is acceptable I strongly suspect that this will have better performance than doing the interpolation on the CPU, at least in my case.

### #1theagentd

Posted 19 January 2012 - 03:47 AM

Hello. I've been trying to do skinning, and so far I think I've kind of figured out the math behind it. I've successfully loaded an MD5 mesh and I'm working on implementing CPU skinning for a small animation following this guide: http://3dgep.com/?p=1053. I think I can get software skinning working quite easily but I definitely want to do the skinning in a vertex shader. I found another tutorial on the same website that covers this: http://3dgep.com/?p=1356. The "problem" is that it seems to do the bone interpolation on the CPU and then submit the data each frame to the GPU. I just transforming 100 bones shouldn't be that CPU heavy, but I might potentially have a thousand instances of the same mesh but in different animations. Therefore I also want to use instancing to reduce the CPU. Since my game is an RTS game the GPU is severely underused even though I offloaded fog of war rendering, so trading GPU cycles for CPU cycles is a good thing in my case.

I could just multi-thread the bone interpolation for an almost linear increase in performance, but it I still thought that it should be possible to off-load almost everything onto the GPU. That's when I stumbled over this white paper: http://developer.dow...gWhitePaper.pdf. It seems to do exactly what I want by storing bone matrices in a texture, which was something I had thought of doing. However, the implementation in the white-paper does not seem to have any kind of bone interpolation, and simply rounds the frame to the closest frame (though this isn't written anywhere). I'm 99.9% sure they don't re-upload the bone matrices each frame since they seem to keep the bone data for each animation and frame, not for each individual instance. Losing interpolation seems like a huge step backwards, so I would definitely not implement skinning if that turned out to be the cost.

I figured I could just upload even "rawer" bone data to my animation texture, meaning I'd keep a 3D vector and a quaternion per bone instead of a matrix and then do the interpolation between the two frames in the vertex shader. The amount of data sampled would only increase by about 33%:

1 matrix per weight = RGBA 32-bit float x 3
2 vectors and 2 quaternions = RGBA 32-bit float x 4

I would also have to upload additional static data to the weights (a position for each weight). The problem is the additional logic needed to transform each vertex since the interpolation would have to be redone for each vertex. I think this additional cost will be almost unnoticeable though since the vertex shader should be bandwidth / texture limited anyway. If there happens to be built-in functions to do slerp (I've found mix(...), but I'm not sure if it's the right one) I think the additional logic cost would be negligible.

In short, I'd port this exact function to a GLSL vertex shader:
	for (int i = 0; i < m.vertices.size(); i++) {
Vertex v = m.vertices.get(i);
float x = 0, y = 0, z = 0;
for (int k = 0; k < v.weightCount; k++) {
Weight w = m.weights.get(v.startWeight + k); //v.startWeight = index in list of weights
Joint j = bindPoseJoints.get(w.joint); //Joint contains position and orientation
rot(j.orientation, w.position, temp); //Quaternion rotation of weight position, temp is a temporary Vec3.