Well met!
I recently started to optimise my Engine. Using VS2012 Performance analysing tool, I quickly found the biggest bottlenecks.
By far the biggest one are my animations. I already managed to go up from 100Fps with 1 simple animated model to 150Fps with 50. While that is not bad, I still need it to get better.
I fixed everything I could, but I don't know how to continue. The major performance-eaters are
D3DXQuaternionInverse
and
D3DXQuaternionMultiply
which are an essential part of the computation that bone-based animations need, right?
Is there a way to work around those or are there cheaper ways to compute bb animations?
Those two take around 50%, the other 50% are beeing lost to stuff like
tempVert->pos.x += ( tempJoint->pos.x + rotatedPoint.x ) * tempWeight->bias;
// or
tempVert->normal.x -= rotatedPoint2.x * tempWeight->bias;
// or
currentSubset->vertices[i]->normal = -tempVert->normal;
( All CPU % in %-of-the-full-time in an uncapped application )
With 5% of the complete CPU time each. I don't understand why those are so very expensive.
Stuff like sqrt, sin and maybe even / are concidered expensive, but why are the basic math operations also a problem?
I am adding the full code of the function in question below, adding the % of CPU to the end of each line which is over 0.1%.
I tried to make it as readable as possible, but the Copy-paste into this forum still screws it up, I'm sorry.
[spoiler]
uint numVCurrentSubset;
Vertex *tempVert;
Weight* tempWeight;
joint* tempJoint;
D3DXQUATERNION tempJointOrientation, tempWeightPos, tempJointOrientationConjugate, tempWeightNormal;
D3DXVECTOR3 rotatedPoint;
D3DXQUATERNION temp1, temp2;
D3DXVECTOR3 rotatedPoint2;
D3DXQUATERNION temp3, temp4;
ModelSubset *currentSubset;
for ( int k = 0; k < numSubsets; k++){
currentSubset = subsets[k];
numVCurrentSubset = currentSubset->vertices.size();
for ( uint i = 0; i < numVCurrentSubset; ++i ){
tempVert = currentSubset->vertices[i]; // ---- 0.3%
tempVert->pos.x = 0.0f; tempVert->pos.y = 0.0f; tempVert->pos.z = 0.0f;
tempVert->normal.x = 0.0f; tempVert->normal.y = 0.0f; tempVert->normal.z = 0.0f;
// Sum up the joints and weights information to get vertex's position and normal
for ( int j = 0; j < tempVert->WeightCount; ++j ){ // ---- 0.6% (only this line, not whole loop)
tempWeight = currentSubset->weights[tempVert->StartWeight + j]; // ---- 0.5%
tempJoint = interpolatedSkeleton[tempWeight->jointID]; // ---- 0.5%
// Convert joint orientation and weight pos to vectors for easier computation
tempJointOrientation.x = tempJoint->orientation.x;
tempJointOrientation.y = tempJoint->orientation.y;
tempJointOrientation.z = tempJoint->orientation.z;
tempJointOrientation.w = tempJoint->orientation.w;
tempWeightPos.x = tempWeight->pos.x;
tempWeightPos.y = tempWeight->pos.y;
tempWeightPos.z = tempWeight->pos.z;
tempWeightPos.w = 0.0f;
// We will need to use the conjugate of the joint orientation quaternion
D3DXQuaternionInverse(&tempJointOrientationConjugate, &tempJointOrientation); // ---- 20.0%
// Calculate vertex position (in joint space, eg. rotate the point around (0,0,0)) for this weight using the joint orientation quaternion and its conjugate
// We can rotate a point using a quaternion with the equation "rotatedPoint = quaternion * point * quaternionConjugate"
D3DXQuaternionMultiply(&temp1, &tempJointOrientation, &tempWeightPos); // ---- 3.5%
D3DXQuaternionMultiply(&temp2, &temp1, &tempJointOrientationConjugate); // ---- 3.5%
rotatedPoint.x = temp2.x;rotatedPoint.y = temp2.y;rotatedPoint.z = temp2.z;
// Now move the verices position from joint space (0,0,0) to the joints position in world space, taking the weights bias into account
tempVert->pos.x += ( tempJoint->pos.x + rotatedPoint.x ) * tempWeight->bias; // ---- 5.1 % (???)
tempVert->pos.y += ( tempJoint->pos.y + rotatedPoint.y ) * tempWeight->bias;
tempVert->pos.z += ( tempJoint->pos.z + rotatedPoint.z ) * tempWeight->bias;
// Compute the normals for this frames skeleton using the weight normals from before
// We can comput the normals the same way we compute the vertices position, only we don't have to translate them (just rotate)
tempWeightNormal.x = tempWeight->normal.x;
tempWeightNormal.y = tempWeight->normal.y;
tempWeightNormal.z = tempWeight->normal.z;
tempWeightNormal.w = 0.0f;
// Rotate the normal
D3DXQuaternionMultiply(&temp3, &tempJointOrientation, &tempWeightPos); // ---- 4.6 %
D3DXQuaternionMultiply(&temp4, &temp3, &tempJointOrientationConjugate); // ---- 6.2 %
rotatedPoint2.x = temp4.x; rotatedPoint2.y = temp4.y; rotatedPoint2.z = temp4.z;
// Add to vertices normal and ake weight bias into account
tempVert->normal.x -= rotatedPoint2.x * tempWeight->bias; // ---- 4.9 %
tempVert->normal.y -= rotatedPoint2.y * tempWeight->bias;
tempVert->normal.z -= rotatedPoint2.z * tempWeight->bias;
}
currentSubset->vertices[i]->pos = tempVert->pos; // -- 0.3%
currentSubset->vertices[i]->normal = -tempVert->normal; // --- 6.1 %
//D3DXVec3Normalize(¤tSubset->vertices[i]->normal, &-tempVert->normal); // can be done on GPU, otherwise ---- 15.0%
}
// Put the positions into the vertices for this subset
for(uint i = 0; i < numVCurrentSubset; i++){ // ---- alltogether 5%, but I can work around this one
currentSubset->verts[i].pos = currentSubset->vertices[i]->pos;
currentSubset->verts[i].normal = currentSubset->vertices[i]->normal;
currentSubset->verts[i].texcoord = currentSubset->vertices[i]->texCoord;
}
// Copy to the GPU
D3D11_MAPPED_SUBRESOURCE mappedVertBuff;
d3dev->devCon->Map(currentSubset->vertBuff, 0, D3D11_MAP_WRITE_DISCARD, 0, &mappedVertBuff);
VertexPosNormalTex *updatedV; updatedV = (VertexPosNormalTex *)mappedVertBuff.pData;
memcpy(updatedV, currentSubset->verts, numVCurrentSubset*sizeof(VertexPosNormalTex));
d3dev->devCon->Unmap(currentSubset->vertBuff, 0);
}
[/spoiler]
I am new to profiling and optimising, so any tips are welcome!
Thanks for your time!