Jump to content
  • Advertisement
Sign in to follow this  
dandrestor

Rendering performance

This topic is 2806 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

First of all hello everybody, this is my first post on the forums.

I am new to OpenGL and 3D programming, and I have recently started a spare-time project (for learning). Everything went well until I tried to generate a lot of objects on-screen, at which point I got very low FPS counts (<5).

My question is, how do I know if I made a mistake or if I'm just hitting the limit of my video card (Intel X3100)?

GPU utilization is at 100%. The number of triangles is around 500000. I'm using VBO, and have disabled any calculations (basically my main loop consists of just GL rendering instructions). If I glDisable(GL_LIGHTING) as well I get 10 more FPS.

Many thanks!
D.

Share this post


Link to post
Share on other sites
Advertisement
There are many and varied optimization techniques you can take advantage of, in order to improve performance. Example 1: Frustum culling. Basically, if an object is outside the viewable area of the camera, don't bother rendering it. You'd be amazed at the boost you can get just from something like that. (Obviously, the greatest benefit is when your draw objects are spread out. if they are all in the camera's view then you will need to resort to other things like occlusion culling and whatnot)

Share this post


Link to post
Share on other sites
As a general rule frustum culling is the first thing you should do. You can also get better performance by using bigger batches in your draw calls, and by switching to indexed primitives (glDraw(Range)Elements instead of glDrawArrays). With an Intel, the unfortunate truth is that their overall performance is just not good, and their OpenGL performance is even worse; switching to Direct3D should give you a nice boost too.

Share this post


Link to post
Share on other sites
Thanks everybody for your replies. I'm currently looking into frustum culling.
I can't switch to direct3d (as far as I know) because I'm programming on Linux, but I can definitely relocate my work on a better laptop :)

I am currently using glDrawArrays, the buffers are interleaved, and I use one buffer per object, with glMultMatrix calls in between.
Would it be a huge performance improvement if I used a bigger buffer for multiple objects, and I would modify the modelview matrix between glDrawRangeElements() calls?

Thanks again for the help!


My display function is currently this:



//set up modelview matrix, etc.

for (i = 0; i < object_count; i++)
{
glPushMatrix();
glMultMatrixf ((GLfloat *)object->getMatrix()->getMatrixPointer());

if (object->getTexture()) {
glBindTexture(GL_TEXTURE_2D, object->getTexture()->getId());
glEnable(GL_TEXTURE_2D);
} else
glDisable(GL_TEXTURE_2D);

glBindBufferARB(GL_ARRAY_BUFFER_ARB, object->getMesh()->getGeometryBuffer());
glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, object->getMesh()->getIndexBuffer());
glEnableClientState(GL_NORMAL_ARRAY);
glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
glNormalPointer(GL_FLOAT, 32, BUFFER_OFFSET(object->getMesh()->getNormalOffset()));
glVertexPointer(3, GL_FLOAT, 32, BUFFER_OFFSET(object->getMesh()->getVertexOffset()));
glTexCoordPointer(2, GL_FLOAT, 32, BUFFER_OFFSET(object->getMesh()->getTexCoordOffset()));
glDrawElements(GL_TRIANGLES, object->getMesh()->getPolygonCount()*3, GL_UNSIGNED_INT, BUFFER_OFFSET(object->getMesh()->getIndexOffset()));
glDisableClientState(GL_VERTEX_ARRAY);
glDisableClientState(GL_NORMAL_ARRAY);
glDisableClientState(GL_TEXTURE_COORD_ARRAY);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, 0);
glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, 0);
glPopMatrix();
}

Share this post


Link to post
Share on other sites

Thanks everybody for your replies. I'm currently looking into frustum culling.
I can't switch to direct3d (as far as I know) because I'm programming on Linux, but I can definitely relocate my work on a better laptop :)

I am currently using glDrawArrays, the buffers are interleaved, and I use one buffer per object, with glMultMatrix calls in between.
Would it be a huge performance improvement if I used a bigger buffer for multiple objects, and I would modify the modelview matrix between glDrawRangeElements() calls?

Thanks again for the help!


My display function is currently this:



//set up modelview matrix, etc.

for (i = 0; i < object_count; i++)
{
glPushMatrix();
glMultMatrixf ((GLfloat *)object->getMatrix()->getMatrixPointer());

if (object->getTexture()) {
glBindTexture(GL_TEXTURE_2D, object->getTexture()->getId());
glEnable(GL_TEXTURE_2D);
} else
glDisable(GL_TEXTURE_2D);

glBindBufferARB(GL_ARRAY_BUFFER_ARB, object->getMesh()->getGeometryBuffer());
glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, object->getMesh()->getIndexBuffer());
glEnableClientState(GL_NORMAL_ARRAY);
glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
glNormalPointer(GL_FLOAT, 32, BUFFER_OFFSET(object->getMesh()->getNormalOffset()));
glVertexPointer(3, GL_FLOAT, 32, BUFFER_OFFSET(object->getMesh()->getVertexOffset()));
glTexCoordPointer(2, GL_FLOAT, 32, BUFFER_OFFSET(object->getMesh()->getTexCoordOffset()));
glDrawElements(GL_TRIANGLES, object->getMesh()->getPolygonCount()*3, GL_UNSIGNED_INT, BUFFER_OFFSET(object->getMesh()->getIndexOffset()));
glDisableClientState(GL_VERTEX_ARRAY);
glDisableClientState(GL_NORMAL_ARRAY);
glDisableClientState(GL_TEXTURE_COORD_ARRAY);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, 0);
glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, 0);
glPopMatrix();
}



That really depends on the buffer size, if you are using a index and vertex buffer for a cube and you have a 1000 or so combining them will give you a massive performance improvement. If you are drawing a very complicated mesh then dumbing the mesh down and using normal maps for detailed skin will give you more of an improvement.

Too many buffers or buffers that are to big can saturate the bandwidth to the GPU and cause you a lot of performance as well, as a general rule you should have around 10K vertices in a buffer.

Share this post


Link to post
Share on other sites
Try moving anything not dependent on your index (i) outside the loop -- enables before, disables after, that sort of thing. That should buy you at least a little performance

Share this post


Link to post
Share on other sites

My question is, how do I know if I made a mistake or if I'm just hitting the limit of my video card (Intel X3100)?

GPU utilization is at 100%.

What does your profiler tell you?

There are the basics like frustum culling and occlusion culling, but even those are guesses about your system.

It is not enough to simply guess at what is slow. Measure and find out exactly what is taking the time.

Share this post


Link to post
Share on other sites

My question is, how do I know if I made a mistake or if I'm just hitting the limit of my video card (Intel X3100)?
GPU utilization is at 100%. The number of triangles is around 500000.

It doesn't sound to me like card limit, at least not the number of vertices. I don't know what your shaders are doing, but when I was testing my game on Intel GMA900 (much weaker, and additionally no hardware vertex processing, so vertex shaders were performed in the CPU) I've had around 20 FPS for around 100000 vertices, so that would be result similar to what you described (and the card is really slower).

I can't suggest you anything regarding your rendering implementation because I am not familiar with OpenGL, but when I implemented my game, the biggest performance boost were obtained by:
- limiting number of texture switches (sorting objects by textures they are using)
- limiting number of shader switches (additionally sorting them by used shader)
- limiting number of vertex and then index buffer switches (if you can allocate all your meshes within a single VBO, try doing it)

Hope this will help
Pomnico

Share this post


Link to post
Share on other sites
I took the glEnableClientState and glDisableClientState calls out of the for loop. Also, as a lot of objects share the same mesh / VBO, I am now only calling glVertexPointer/glNormalPointer/etc. if I really need to. Unfortunately, the performance is still the same.
How do I go about profiling the app? Can you give me some pointers? I started looking into Valgrind (remember this is Linux). Am I using the right tool?
Thanks!

D.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!