Rendering performance

Started by
12 comments, last by dandrestor 13 years, 3 months ago
First of all hello everybody, this is my first post on the forums.

I am new to OpenGL and 3D programming, and I have recently started a spare-time project (for learning). Everything went well until I tried to generate a lot of objects on-screen, at which point I got very low FPS counts (<5).

My question is, how do I know if I made a mistake or if I'm just hitting the limit of my video card (Intel X3100)?

GPU utilization is at 100%. The number of triangles is around 500000. I'm using VBO, and have disabled any calculations (basically my main loop consists of just GL rendering instructions). If I glDisable(GL_LIGHTING) as well I get 10 more FPS.

Many thanks!
D.
Advertisement
As a rule of thumb, if you have an Intel card then you are hitting it's limits. But you should post some code.
There are many and varied optimization techniques you can take advantage of, in order to improve performance. Example 1: Frustum culling. Basically, if an object is outside the viewable area of the camera, don't bother rendering it. You'd be amazed at the boost you can get just from something like that. (Obviously, the greatest benefit is when your draw objects are spread out. if they are all in the camera's view then you will need to resort to other things like occlusion culling and whatnot)
There was a saying we had in college: Those who walk into the engineering building are never quite the same when they walk out.
As a general rule frustum culling is the first thing you should do. You can also get better performance by using bigger batches in your draw calls, and by switching to indexed primitives (glDraw(Range)Elements instead of glDrawArrays). With an Intel, the unfortunate truth is that their overall performance is just not good, and their OpenGL performance is even worse; switching to Direct3D should give you a nice boost too.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Thanks everybody for your replies. I'm currently looking into frustum culling.
I can't switch to direct3d (as far as I know) because I'm programming on Linux, but I can definitely relocate my work on a better laptop :)

I am currently using glDrawArrays, the buffers are interleaved, and I use one buffer per object, with glMultMatrix calls in between.
Would it be a huge performance improvement if I used a bigger buffer for multiple objects, and I would modify the modelview matrix between glDrawRangeElements() calls?

Thanks again for the help!


My display function is currently this:



//set up modelview matrix, etc.

for (i = 0; i < object_count; i++)
{
glPushMatrix();
glMultMatrixf ((GLfloat *)object->getMatrix()->getMatrixPointer());

if (object->getTexture()) {
glBindTexture(GL_TEXTURE_2D, object->getTexture()->getId());
glEnable(GL_TEXTURE_2D);
} else
glDisable(GL_TEXTURE_2D);

glBindBufferARB(GL_ARRAY_BUFFER_ARB, object->getMesh()->getGeometryBuffer());
glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, object->getMesh()->getIndexBuffer());
glEnableClientState(GL_NORMAL_ARRAY);
glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
glNormalPointer(GL_FLOAT, 32, BUFFER_OFFSET(object->getMesh()->getNormalOffset()));
glVertexPointer(3, GL_FLOAT, 32, BUFFER_OFFSET(object->getMesh()->getVertexOffset()));
glTexCoordPointer(2, GL_FLOAT, 32, BUFFER_OFFSET(object->getMesh()->getTexCoordOffset()));
glDrawElements(GL_TRIANGLES, object->getMesh()->getPolygonCount()*3, GL_UNSIGNED_INT, BUFFER_OFFSET(object->getMesh()->getIndexOffset()));
glDisableClientState(GL_VERTEX_ARRAY);
glDisableClientState(GL_NORMAL_ARRAY);
glDisableClientState(GL_TEXTURE_COORD_ARRAY);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, 0);
glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, 0);
glPopMatrix();
}

Thanks everybody for your replies. I'm currently looking into frustum culling.
I can't switch to direct3d (as far as I know) because I'm programming on Linux, but I can definitely relocate my work on a better laptop :)

I am currently using glDrawArrays, the buffers are interleaved, and I use one buffer per object, with glMultMatrix calls in between.
Would it be a huge performance improvement if I used a bigger buffer for multiple objects, and I would modify the modelview matrix between glDrawRangeElements() calls?

Thanks again for the help!


My display function is currently this:



//set up modelview matrix, etc.

for (i = 0; i < object_count; i++)
{
glPushMatrix();
glMultMatrixf ((GLfloat *)object->getMatrix()->getMatrixPointer());

if (object->getTexture()) {
glBindTexture(GL_TEXTURE_2D, object->getTexture()->getId());
glEnable(GL_TEXTURE_2D);
} else
glDisable(GL_TEXTURE_2D);

glBindBufferARB(GL_ARRAY_BUFFER_ARB, object->getMesh()->getGeometryBuffer());
glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, object->getMesh()->getIndexBuffer());
glEnableClientState(GL_NORMAL_ARRAY);
glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
glNormalPointer(GL_FLOAT, 32, BUFFER_OFFSET(object->getMesh()->getNormalOffset()));
glVertexPointer(3, GL_FLOAT, 32, BUFFER_OFFSET(object->getMesh()->getVertexOffset()));
glTexCoordPointer(2, GL_FLOAT, 32, BUFFER_OFFSET(object->getMesh()->getTexCoordOffset()));
glDrawElements(GL_TRIANGLES, object->getMesh()->getPolygonCount()*3, GL_UNSIGNED_INT, BUFFER_OFFSET(object->getMesh()->getIndexOffset()));
glDisableClientState(GL_VERTEX_ARRAY);
glDisableClientState(GL_NORMAL_ARRAY);
glDisableClientState(GL_TEXTURE_COORD_ARRAY);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, 0);
glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, 0);
glPopMatrix();
}



That really depends on the buffer size, if you are using a index and vertex buffer for a cube and you have a 1000 or so combining them will give you a massive performance improvement. If you are drawing a very complicated mesh then dumbing the mesh down and using normal maps for detailed skin will give you more of an improvement.

Too many buffers or buffers that are to big can saturate the bandwidth to the GPU and cause you a lot of performance as well, as a general rule you should have around 10K vertices in a buffer.

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max, Watch Dogs: Legion

Try moving anything not dependent on your index (i) outside the loop -- enables before, disables after, that sort of thing. That should buy you at least a little performance
There was a saying we had in college: Those who walk into the engineering building are never quite the same when they walk out.

My question is, how do I know if I made a mistake or if I'm just hitting the limit of my video card (Intel X3100)?

GPU utilization is at 100%.

What does your profiler tell you?

There are the basics like frustum culling and occlusion culling, but even those are guesses about your system.

It is not enough to simply guess at what is slow. Measure and find out exactly what is taking the time.

My question is, how do I know if I made a mistake or if I'm just hitting the limit of my video card (Intel X3100)?
GPU utilization is at 100%. The number of triangles is around 500000.

It doesn't sound to me like card limit, at least not the number of vertices. I don't know what your shaders are doing, but when I was testing my game on Intel GMA900 (much weaker, and additionally no hardware vertex processing, so vertex shaders were performed in the CPU) I've had around 20 FPS for around 100000 vertices, so that would be result similar to what you described (and the card is really slower).

I can't suggest you anything regarding your rendering implementation because I am not familiar with OpenGL, but when I implemented my game, the biggest performance boost were obtained by:
- limiting number of texture switches (sorting objects by textures they are using)
- limiting number of shader switches (additionally sorting them by used shader)
- limiting number of vertex and then index buffer switches (if you can allocate all your meshes within a single VBO, try doing it)

Hope this will help
Pomnico
I took the glEnableClientState and glDisableClientState calls out of the for loop. Also, as a lot of objects share the same mesh / VBO, I am now only calling glVertexPointer/glNormalPointer/etc. if I really need to. Unfortunately, the performance is still the same.
How do I go about profiling the app? Can you give me some pointers? I started looking into Valgrind (remember this is Linux). Am I using the right tool?
Thanks!

D.

This topic is closed to new replies.

Advertisement