Back to For Beginners

Rendering performance

For Beginners

Started by dandrestor January 07, 2011 07:01 AM

12 comments, last by dandrestor 13 years, 3 months ago

dandrestor

102

Author

January 07, 2011 07:01 AM

First of all hello everybody, this is my first post on the forums.

I am new to OpenGL and 3D programming, and I have recently started a spare-time project (for learning). Everything went well until I tried to generate a lot of objects on-screen, at which point I got very low FPS counts (<5).

My question is, how do I know if I made a mistake or if I'm just hitting the limit of my video card (Intel X3100)?

GPU utilization is at 100%. The number of triangles is around 500000. I'm using VBO, and have disabled any calculations (basically my main loop consists of just GL rendering instructions). If I glDisable(GL_LIGHTING) as well I get 10 more FPS.

Many thanks!
D.

Ohforf sake

2,052

January 09, 2011 06:48 PM

As a rule of thumb, if you have an Intel card then you are hitting it's limits. But you should post some code.

medevilenemy

363

January 09, 2011 06:59 PM

There are many and varied optimization techniques you can take advantage of, in order to improve performance. Example 1: Frustum culling. Basically, if an object is outside the viewable area of the camera, don't bother rendering it. You'd be amazed at the boost you can get just from something like that. (Obviously, the greatest benefit is when your draw objects are spread out. if they are all in the camera's view then you will need to resort to other things like occlusion culling and whatnot)

There was a saying we had in college: Those who walk into the engineering building are never quite the same when they walk out.

21st Century Moose

13,459

January 09, 2011 10:43 PM

As a general rule frustum culling is the first thing you should do. You can also get better performance by using bigger batches in your draw calls, and by switching to indexed primitives (glDraw(Range)Elements instead of glDrawArrays). With an Intel, the unfortunate truth is that their overall performance is just not good, and their OpenGL performance is even worse; switching to Direct3D should give you a nice boost too.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

dandrestor

102

Author

January 12, 2011 05:19 PM

Thanks everybody for your replies. I'm currently looking into frustum culling.
I can't switch to direct3d (as far as I know) because I'm programming on Linux, but I can definitely relocate my work on a better laptop

I am currently using glDrawArrays, the buffers are interleaved, and I use one buffer per object, with glMultMatrix calls in between.
Would it be a huge performance improvement if I used a bigger buffer for multiple objects, and I would modify the modelview matrix between glDrawRangeElements() calls?

Thanks again for the help!

My display function is currently this:

 



//set up modelview matrix, etc.



for (i = 0; i < object_count; i++)

	{

		glPushMatrix();

		glMultMatrixf ((GLfloat *)object->getMatrix()->getMatrixPointer());



		if (object->getTexture()) {

			glBindTexture(GL_TEXTURE_2D, object->getTexture()->getId());

			glEnable(GL_TEXTURE_2D);

		} else

			glDisable(GL_TEXTURE_2D);



		glBindBufferARB(GL_ARRAY_BUFFER_ARB, object->getMesh()->getGeometryBuffer());

		glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, object->getMesh()->getIndexBuffer());

		glEnableClientState(GL_NORMAL_ARRAY);

		glEnableClientState(GL_VERTEX_ARRAY);

		glEnableClientState(GL_TEXTURE_COORD_ARRAY);

		glNormalPointer(GL_FLOAT, 32, BUFFER_OFFSET(object->getMesh()->getNormalOffset()));

		glVertexPointer(3, GL_FLOAT, 32, BUFFER_OFFSET(object->getMesh()->getVertexOffset()));

		glTexCoordPointer(2, GL_FLOAT, 32, BUFFER_OFFSET(object->getMesh()->getTexCoordOffset()));

		glDrawElements(GL_TRIANGLES, object->getMesh()->getPolygonCount()*3, GL_UNSIGNED_INT, BUFFER_OFFSET(object->getMesh()->getIndexOffset()));

		glDisableClientState(GL_VERTEX_ARRAY);

		glDisableClientState(GL_NORMAL_ARRAY);

		glDisableClientState(GL_TEXTURE_COORD_ARRAY);

		glBindBufferARB(GL_ARRAY_BUFFER_ARB, 0);

		glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, 0);

		glPopMatrix();

	}

NightCreature83

5,061

January 12, 2011 07:12 PM

Thanks everybody for your replies. I'm currently looking into frustum culling.
I can't switch to direct3d (as far as I know) because I'm programming on Linux, but I can definitely relocate my work on a better laptop

I am currently using glDrawArrays, the buffers are interleaved, and I use one buffer per object, with glMultMatrix calls in between.
Would it be a huge performance improvement if I used a bigger buffer for multiple objects, and I would modify the modelview matrix between glDrawRangeElements() calls?

Thanks again for the help!

My display function is currently this:

//set up modelview matrix, etc. for (i = 0; i < object_count; i++) { glPushMatrix(); glMultMatrixf ((GLfloat *)object->getMatrix()->getMatrixPointer()); if (object->getTexture()) { glBindTexture(GL_TEXTURE_2D, object->getTexture()->getId()); glEnable(GL_TEXTURE_2D); } else glDisable(GL_TEXTURE_2D); glBindBufferARB(GL_ARRAY_BUFFER_ARB, object->getMesh()->getGeometryBuffer()); glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, object->getMesh()->getIndexBuffer()); glEnableClientState(GL_NORMAL_ARRAY); glEnableClientState(GL_VERTEX_ARRAY); glEnableClientState(GL_TEXTURE_COORD_ARRAY); glNormalPointer(GL_FLOAT, 32, BUFFER_OFFSET(object->getMesh()->getNormalOffset())); glVertexPointer(3, GL_FLOAT, 32, BUFFER_OFFSET(object->getMesh()->getVertexOffset())); glTexCoordPointer(2, GL_FLOAT, 32, BUFFER_OFFSET(object->getMesh()->getTexCoordOffset())); glDrawElements(GL_TRIANGLES, object->getMesh()->getPolygonCount()*3, GL_UNSIGNED_INT, BUFFER_OFFSET(object->getMesh()->getIndexOffset())); glDisableClientState(GL_VERTEX_ARRAY); glDisableClientState(GL_NORMAL_ARRAY); glDisableClientState(GL_TEXTURE_COORD_ARRAY); glBindBufferARB(GL_ARRAY_BUFFER_ARB, 0); glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, 0); glPopMatrix(); }

That really depends on the buffer size, if you are using a index and vertex buffer for a cube and you have a 1000 or so combining them will give you a massive performance improvement. If you are drawing a very complicated mesh then dumbing the mesh down and using normal maps for detailed skin will give you more of an improvement.

Too many buffers or buffers that are to big can saturate the bandwidth to the GPU and cause you a lot of performance as well, as a general rule you should have around 10K vertices in a buffer.

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max, Watch Dogs: Legion

medevilenemy

363

January 12, 2011 07:13 PM

Try moving anything not dependent on your index (i) outside the loop -- enables before, disables after, that sort of thing. That should buy you at least a little performance

There was a saying we had in college: Those who walk into the engineering building are never quite the same when they walk out.

frob

46,223

January 12, 2011 08:46 PM

My question is, how do I know if I made a mistake or if I'm just hitting the limit of my video card (Intel X3100)?

GPU utilization is at 100%.

What does your profiler tell you?

There are the basics like frustum culling and occlusion culling, but even those are guesses about your system.

It is not enough to simply guess at what is slow. Measure and find out exactly what is taking the time.

Pomnico

110

January 13, 2011 05:22 AM

My question is, how do I know if I made a mistake or if I'm just hitting the limit of my video card (Intel X3100)?
GPU utilization is at 100%. The number of triangles is around 500000.

It doesn't sound to me like card limit, at least not the number of vertices. I don't know what your shaders are doing, but when I was testing my game on Intel GMA900 (much weaker, and additionally no hardware vertex processing, so vertex shaders were performed in the CPU) I've had around 20 FPS for around 100000 vertices, so that would be result similar to what you described (and the card is really slower).

I can't suggest you anything regarding your rendering implementation because I am not familiar with OpenGL, but when I implemented my game, the biggest performance boost were obtained by:
- limiting number of texture switches (sorting objects by textures they are using)
- limiting number of shader switches (additionally sorting them by used shader)
- limiting number of vertex and then index buffer switches (if you can allocate all your meshes within a single VBO, try doing it)

Hope this will help
Pomnico

My current project: 3D jigsaw puzzle game (under development)
My blog: Magic-Ars Blog

dandrestor

102

Author

January 13, 2011 05:11 PM

I took the glEnableClientState and glDisableClientState calls out of the for loop. Also, as a lot of objects share the same mesh / VBO, I am now only calling glVertexPointer/glNormalPointer/etc. if I really need to. Unfortunately, the performance is still the same.
How do I go about profiling the app? Can you give me some pointers? I started looking into Valgrind (remember this is Linux). Am I using the right tool?
Thanks!

D.

Rendering performance

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Rendering performance

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines