glDrawArrays efficiency issue for a keyframing function

Started by
5 comments, last by V-man 12 years, 4 months ago
Hi all

The original question for this topic was how to re factor a keyframing/rendering function which uses immediate mode in order to make it more efficient, since I need to make two passes at all the vertices for an after effect. This was the original function



void MD2Model::draw() {
glEnable(GL_TEXTURE_2D);
glBindTexture(GL_TEXTURE_2D, textureId);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);

//Figure out the two frames between which we are interpolating
int frameIndex1 = (int)(time * (endFrame - startFrame + 1)) + startFrame;
if (frameIndex1 > endFrame) {
frameIndex1 = startFrame;
}

int frameIndex2;
if (frameIndex1 < endFrame) {
frameIndex2 = frameIndex1 + 1;
}
else {
frameIndex2 = startFrame;
}

MD2Frame* frame1 = frames + frameIndex1;
MD2Frame* frame2 = frames + frameIndex2;

//Figure out the fraction that we are between the two frames
float frac =
(time - (float)(frameIndex1 - startFrame) /
(float)(endFrame - startFrame + 1)) * (endFrame - startFrame + 1);

//Draw the model as an interpolation between the two frames
glBegin(GL_TRIANGLES);
for(int i = 0; i < numTriangles; i++) {
MD2Triangle* triangle = triangles + i;
for(int j = 0; j < 3; j++) {
MD2Vertex* v1 = frame1->vertices + triangle->vertices[j];
MD2Vertex* v2 = frame2->vertices + triangle->vertices[j];
Vec3f pos = v1->pos * (1 - frac) + v2->pos * frac;
Vec3f normal = v1->normal * (1 - frac) + v2->normal * frac;
if (normal[0] == 0 && normal[1] == 0 && normal[2] == 0) {
normal = Vec3f(0, 0, 1);
}
glNormal3f(normal[0], normal[1], normal[2]);

MD2TexCoord* texCoord = texCoords + triangle->texCoords[j];
glTexCoord2f(texCoord->texCoordX, texCoord->texCoordY);
glVertex3f(pos[0], pos[1], pos[2]);
}
}
glEnd();
}


Which after a lot of work I reworked to store the vertices in std::vectors at each pass and display them afterwards with glDrawArrays, which should theoretically be faster.... But it's proving to be a LOT slower for now... Is this function implemented correctly, or am I doing something unnecessary?


void MD2Model::drawToon() {
float outlineWidth = 3.0f; // Width Of The Lines ( NEW )
float outlineColor[3] = { 0.0f, 0.0f, 0.0f }; // Color Of The Lines ( NEW )


//Figure out the two frames between which we are interpolating
int frameIndex1 = (int)(time * (endFrame - startFrame + 1)) + startFrame;
if (frameIndex1 > endFrame) {
frameIndex1 = startFrame;
}

int frameIndex2;
if (frameIndex1 < endFrame) {
frameIndex2 = frameIndex1 + 1;


for(int i = 0; i < numTriangles; i++) {
MD2Triangle* triangle = triangles + i;
for(int j = 0; j < 3; j++) {
MD2Vertex* v1 = frame1->vertices + triangle->vertices[j];
MD2Vertex* v2 = frame2->vertices + triangle->vertices[j];
Vec3f pos = v1->pos * (1 - frac) + v2->pos * frac;
Vec3f normal = v1->normal * (1 - frac) + v2->normal * frac;
if (normal[0] == 0 && normal[1] == 0 && normal[2] == 0) {
normal = Vec3f(0, 0, 1);
}


normals.push_back(normal[0]);
normals.push_back(normal[1]);
normals.push_back(normal[2]);

MD2TexCoord* texCoord = texCoords + triangle->texCoords[j];
textCoords.push_back(texCoord->texCoordX);
textCoords.push_back(texCoord->texCoordY);

vertices.push_back(pos[0]);
vertices.push_back(pos[1]);
vertices.push_back(pos[2]);
}

}


glEnableClientState(GL_NORMAL_ARRAY);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
glEnableClientState(GL_VERTEX_ARRAY);

glNormalPointer(GL_FLOAT, 0, &normals[0]);
glTexCoordPointer(2, GL_FLOAT, 0, &textCoords[0]);
glVertexPointer(3, GL_FLOAT, 0, &vertices[0]);



glDrawArrays(GL_TRIANGLES, 0, vertices.size()/3);


glDisableClientState(GL_VERTEX_ARRAY); // disable vertex arrays
glDisableClientState(GL_TEXTURE_COORD_ARRAY);
glDisableClientState(GL_NORMAL_ARRAY);

vertices.clear();
textCoords.clear();
normals.clear();
}


Thanks in advance for any help you may be able to provide
Advertisement
The answer to your question is simply "YES". I'm not an opengl specialist but I can see that you are looping through your triangles and performing frame interpolation on CPU which is rather slow depending on the amount of vertices / faces. Slow as compared to doing the same thing on GPU.

You should put the vertex data to a vertex buffer object and perform required interpolation on GPU. I think toon shading could be done as a post-process effect, without the need to go through the triangles / edges.

Cheers!
Have updated my original question as I got past the optimization problems I was facing, can anyone have a look at the way I am using glDrawArrays here as check whether I am using it correctly? As it is currently giving me a worse performance than the original function with immediate drawing mode....

Thanks a lot
Don't use std::vector at runtime like this. You're basically allocating a lot of short-lived memory in tight little loops, potentially copying off for resizing the vector, and that is hurting you. You already know that each model has numTriangles * 3 verts so allocate space for these up-front at load time, and just fill it in at runtime. If the memory overhead frightens you (but you really should tot it up first - it may be lower than you think), then you can take the largest number of verts for all models that you're going to draw, allocate one block of memory based on that, and just reuse that for each model.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Also consider interleaving your arrays; non-interleaved arrays are slower with hardware T&L.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.


Also consider interleaving your arrays; non-interleaved arrays are slower with hardware T&L.


Thank you very much.... I actually do know the maximum number of vertices upfront since the MD2 format has that for its limit, it's 2048.... So if I instead use a Glfloat vertices[2048] array and insert those data in the array at each pass then that should be faster? How can I interleave the data in one array? I understand that I'd have tio use the same array reference in VertexPointer, TextCoordPointer and NormalPointer, but how do I work out the other two members?

Thanks a lot for your help!
Ok, have yet again refactored the function, this time using one interleaved dynamic array GLfloat displayList that I initialize onLoad according to the mesh vertex count. I get a fairly good boost, from15 FPS frames to roughly 25 FPS. So thanks for the hint! This is the new function, including the two passes that I do for recreating Nehe's toon shade effect:



// ... calculate frame position....


int vCount = 0;

for(int i = 0; i < numTriangles; i++) {

//Calculate vertices interpolation
MD2Triangle* triangle = triangles + i;
for(int j = 0; j < 3; j++) {
MD2Vertex* v1 = frame1->vertices + triangle->vertices[j];
MD2Vertex* v2 = frame2->vertices + triangle->vertices[j];
Vec3f pos = v1->pos * (1 - frac) + v2->pos * frac;
Vec3f normal = v1->normal * (1 - frac) + v2->normal * frac;
if (normal[0] == 0 && normal[1] == 0 && normal[2] == 0) {
normal = Vec3f(0, 0, 1);
}

displayList[vCount] = normal[0];
displayList[vCount+1] = normal[1];
displayList[vCount+2] = normal[2];
vCount+=3;

MD2TexCoord* texCoord = texCoords + triangle->texCoords[j];
displayList[vCount] = texCoord->texCoordX;
displayList[vCount+1] = texCoord->texCoordY;
vCount+=2;

displayList[vCount] = pos[0];
displayList[vCount+1] = pos[1];
displayList[vCount+2] = pos[2];
vCount +=3;
}

}


glEnableClientState(GL_NORMAL_ARRAY);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
glEnableClientState(GL_VERTEX_ARRAY);

glNormalPointer(GL_FLOAT, 8*sizeof(float), &displayList[0]);
glTexCoordPointer(2, GL_FLOAT, 8*sizeof(float), &displayList[3]);
glVertexPointer(3, GL_FLOAT, 8*sizeof(float), &displayList[5]);

// Cel-Shading Code = Shade
glHint (GL_LINE_SMOOTH_HINT, GL_FASTEST); // Use The Good Calculations ( NEW )
glEnable (GL_LINE_SMOOTH);
glEnable (GL_TEXTURE_1D); // Enable 1D Texturing ( NEW )
glBindTexture (GL_TEXTURE_1D, shaderTexture[0]); // Bind Our Texture ( NEW )
glColor3f (1.0f, 1.0f, 1.0f); // Set The Color Of The Model ( NEW )

//PASS 1
glDrawArrays(GL_TRIANGLES, 0, numTriangles * 3);


glDisable (GL_TEXTURE_1D); // Disable 1D Textures ( NEW )


//Cel-Shading Code = Outline
glEnable (GL_BLEND); // Enable Blending ( NEW )
glBlendFunc(GL_SRC_ALPHA,GL_ONE_MINUS_SRC_ALPHA); // Set The Blend Mode ( NEW )

glPolygonMode (GL_BACK, GL_LINE); // Draw Backfacing Polygons As Wireframes ( NEW )
glLineWidth (outlineWidth); // Set The Line Width ( NEW )
glCullFace (GL_FRONT); // Don't Draw Any Front-Facing Polygons ( NEW )
glDepthFunc (GL_LEQUAL); // Change The Depth Mode ( NEW )
glColor3fv (&outlineColor[0]); // Set The Outline Color ( NEW )

//PASS 2
glDrawArrays(GL_TRIANGLES, 0, numTriangles * 3);


glDepthFunc (GL_LESS); // Reset The Depth-Testing Mode ( NEW )
glCullFace (GL_BACK); // Reset The Face To Be Culled ( NEW )
glPolygonMode (GL_BACK, GL_FILL); // Reset Back-Facing Polygon Drawing Mode ( NEW )
glDisable (GL_BLEND);

glDisableClientState(GL_VERTEX_ARRAY); // disable vertex arrays
glDisableClientState(GL_TEXTURE_COORD_ARRAY);
glDisableClientState(GL_NORMAL_ARRAY);



Does this look as efficient as I can get without using vertex shaders?

Thanks a lot for your directions!
You are only getting 25 FPS for rendering a single Quake 2 model or are you rendering other things? What GPU do you have?
You can try to comment out the for loops to see if that is your bottleneck.

Try running Quake 2 demo on your machine. You should be getting a solid 60 FPS and plenty of CPU and GPU power left over on today systems and keep in mind that Quake 2 did software animation just like you are doing.
Sig: http://glhlib.sourceforge.net
an open source GLU replacement library. Much more modern than GLU.
float matrix[16], inverse_matrix[16];
glhLoadIdentityf2(matrix);
glhTranslatef2(matrix, 0.0, 0.0, 5.0);
glhRotateAboutXf2(matrix, angleInRadians);
glhScalef2(matrix, 1.0, 1.0, -1.0);
glhQuickInvertMatrixf2(matrix, inverse_matrix);
glUniformMatrix4fv(uniformLocation1, 1, FALSE, matrix);
glUniformMatrix4fv(uniformLocation2, 1, FALSE, inverse_matrix);

This topic is closed to new replies.

Advertisement