Model animation - too many calculations
#1 Members - Reputation: 906
Posted 04 February 2011 - 06:44 AM
The problem is then with animating the model. I'm using a low-polygon model - something like 760 faces, with less than 2000 vertecies.
I then have to calculate the new position of each vertex based on it's bone attachment. For each bone, this consists of 3 multiplications, type double, to rotate to the correct position, 3 additions, again for doubles, to translate it to the correct position, and then a multiplication of that whole result by the weight assigned to that bone. At the end, I simply add up all the results for each bone, to get the new position.
The above method works, assuming the attachments are correct. The problem is that it seems to take TOO long to compute that for all vertecies. I split the calculation up in 4 parts, where each update event it does 1/4th of the vertecies, and then updates only after all of them are calculated. That allowed me a bit over 60fps, but considering I'm only drawing one low poly model, that framerate is unacceptable. (For comparison, without calculating the vertex positions, I get framerate of 900+). I'm not sure where to optimize the calculations. They're all done on the CPU, which is a pretty big bottleneck, but I can't figure out how, if at all possible, to use the the GPU for that.
Anyone have any experience programming model animation? Perhaps animating a model from a skeleton from an existing animation format? Any help at all would be greatly appreciated.
#2 GDNet+ - Reputation: 319
Posted 04 February 2011 - 06:52 AM
Bear in mind that a Bone is effectively a transformation matrix, and your Graphics API (OpenGL/DirectX) will be doing these anyway when you set the position of the Vertex (glVertex...(), etc). Offloading the calculations to a Vertex Shader would be a good boost.
Blitz Games Studios
[ Events 4 Gamers | NeHe OpenGL Tutorials | LinkedIn | Development Journal | How To: Debugging | Twitter: @LeadHyperion ]
And as they say, what happens in Las Psyche, stays in Las Psyche -Ravyne
Often the bliss of not knowing is an important part of maintaining sanity. - frob
#3 Members - Reputation: 1189
Posted 04 February 2011 - 07:32 AM
Also, it sounds like your weighting algorithm is suspect. As AndyEsser mentions, bones are really reference frames, commonly implemented as matrices, not separate rotations and translations.
In any case, a common approach (which can be implemented on the CPU or GPU):
// Given a vertex vin, 3 weights, 3 bone indices for that vertex, and an array of bone matrices.. vector3 vout = vector3(0,0,0); for( int i=0; i<3; i++) vout += weight[i]*vectorMatrixMult( vin, boneMat[ boneIndex[i] ] );
#4 Members - Reputation: 906
Posted 04 February 2011 - 07:56 AM
I thought I was minimizing calculations by using my method instead of a matrix. What I have is a set of Axes that gets rotated, and that determines the bone's orientation. When loading the model I store the original relative position of each vertex to every attached bone's axes. Then to get the vertex's transformed location I essentially have a vector: ( origRelVer_x * x_axis + origRelVer_y * y_axis + origRelVer_z * z_axis), where xyz_axis are orthogonal vectors (i.e. the axes's axis) I thought this would be less calculations than a matrix multiplication, but then, it's all cpu. (Edit: now that I saw my code again, I realize that's actually a multiplication of a scalar times a vector = 3 mults, so 9 total multiplications + 2 additions, and that's before the other 3 additions for the correct position. So, yes, quite a bit more than I originally mentioned, and not actually an improvement over matrix multiplication)
You're right that FPS isn't the best measure, especially since I average it over a second. The function clearly is slow though, so I need to improve it. Though from what it sounds like I might have to rework the way my bone rotation is stored.
I'm starting to wonder whether it's worth reinventing the wheel for the experience. I know this problem has been solved before, though I haven't found too detailed descriptions about it.
#5 Members - Reputation: 1189
Posted 04 February 2011 - 08:18 AM
Also, if you have 700 faces and 2000 vertices, sounds like you're not using indexed vertices. If you use an indexed mesh, that might cut down the
Also, it obviously depends on the number of material changes that're made, also. Combining multiple textures into one will reduce the time to change texture samplers.
EDIT: With regard to experience, depending on how much animation work you'll be doing in the future, understanding what it takes to do skinned animation might serve you well. However, downloading some code and examining it for a good understanding may be better than working out all the details yourself.
EDIT2:
Well, that's unfortunate, because I have no experience with shaders.
No time like the present. If you continue with game development, shaders are where you need to be anyway.
#6 Members - Reputation: 906
Posted 04 February 2011 - 08:53 AM
Also for the vertex and face count and vertex count...hmm I just realized. See, I wrote my own .obj loader, and .obj format specifies a list of vertex positions, a separate list of normals, and a third list of texture coordinates. Then each face is a set of indecies to each list. Because I'm storing the texture and normal data in my vertex struct, I had to read in all the lists, and then when reading in a face, create its vertecies on the spot. I have a check to see if that perticular vertex (combination of position, normal and texture) already exists, and if it does, link to that instead of creating a new one. But, as I just realized, I turned that off, because it lengthens loading time by about a second. I'm going to turn that check on, which does indeed greatly decrease the number of vertecies (as most faces share). I don't think it will have a huge impact, but it's worth a try. Thanks for reminding me!
#7 Members - Reputation: 1773
Posted 04 February 2011 - 09:05 AM
float matrices[20*4*3];
float vertices[2000*3];
float weights[6000];
float results[2000*3];
int matrixIndex = -1;
float * vertex = vertices;
float * result = results;
float * weight = weights;
for(uint vertexIndex = 0; vertexIndex<2000; ++vertexIndex) {
result[0] = result[1] = result[2] = 0.0f;
for(uint counter = 0; counter<3; ++counter) {
matrixIndex = (matrixIndex+1)%20;
float * matrix = matrices+matrixIndex*4*3;
result[0] += *weight * (*matrix++ * vertex[0] + *matrix++ * vertex[1] + *matrix++ * vertex[2] + *matrix++);
result[1] += *weight * (*matrix++ * vertex[0] + *matrix++ * vertex[1] + *matrix++ * vertex[2] + *matrix++);
result[2] += *weight * (*matrix++ * vertex[0] + *matrix++ * vertex[1] + *matrix++ * vertex[2] + *matrix++);
++weight;
}
vertex += 3;
result += 3;
}
Using doubles instead means it to last approx. 330 micro-seconds (debug) / 190 micro-seconds (release).
#8 Members - Reputation: 906
Posted 04 February 2011 - 09:18 AM
Here's what my vertex update code looks like:
mesh_vertex *currVer;
for (unsigned int i = start; i < end; i++)
{ // for each vertex of the model
Vec3f newPos(0.0, 0.0, 0.0), tPos;
currVer = vertexList.at(i);
for (int k = 0; k < vertexAttachedBones.at(i)->getNumBones(); k++)
{ // for each of it's attached bones
tPos = getAbsolutePosition(
vertexOriginalPosition.at(i).at(k),
vertexAttachedBones.at(i)->getBoneAt(k)
);
newPos = newPos +
tPos *
vertexAttachedBones.at(i)->getNormalizedWeightAt(k);
}
// update position
currVer->x = newPos.x;
currVer->y = newPos.y;
currVer->z = newPos.z;
}
Specifically, it runs in 1 or less than 1 millisecond (I used clock() to get the time). It could probably still use some improvement, but it's nowhere near as bad as I thought it was. Also, I know it's a bad measure, but I get ~650fps now. Much more acceptable.
Thanks for the help everyone.
edit: I have no clue why there's such a huge gap in performance between debug and release. Is it because I update everything with pointers? Not that it makes a big difference, since release is the mode to use for .. well, releasing.
#10 Members - Reputation: 1189
Posted 04 February 2011 - 09:51 AM
#11 Members - Reputation: 906
Posted 04 February 2011 - 09:58 AM
Debug compiles (depending on what API you're using) usually link in debug (vs. release) libraries. Those debug libraries commonly do a lot of error-checking ( that's what "debug" does for a living ) - looking for unitialized variables, array indices out of bounds, etc., etc. Your experience isn't necessarily out of the ordinary. E.g., I get ratios of 200-400 ( 180mS vs. 0.7mS for one of my routines comes to mind ) between debug and release execution times.
Yeah, I knew debug does a lot for checking for errors, hence why I usually run in debug mode when debugging (actually, I've noticed you can't rely on the debugger in release mode). I've just never had such a huge gap in performance, until now. That'll teach me to post on here with questions that could've been resolved by two mouse clicks. =)
#12 Members - Reputation: 906
Posted 04 February 2011 - 10:33 AM
#13 Members - Reputation: 1189
Posted 04 February 2011 - 11:52 AM






