• Advertisement
Sign in to follow this  

Bad performance when rendering medium amount of meshes

This topic is 932 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm getting massive performance loss when rendering a medium-sized scene with a reasonable(?) amount of meshes.

The stats for all rendered (non-occluded) objects in the scene are:

Triangles: 204432
Vertices: 68449
Shader Changes (glUseProgram): 3
Material Changes (glBindTexture): 239
Meshes: 7153
Render Duration: 23ms (~43fps)

I know that rendering a lot of low-poly meshes is a lot more expensive than rendering a handful of high-poly meshes, but still, 7153 meshes with an average of ~28 triangles doesn't seem like a big deal to me, and yet the performance goes down the drain.

 

Before rendering, all of my meshes are first sorted by shader, then by material. The main render process is as follows:

foreach shader
	glUseProgram(shader)
	foreach material
                glBindTexture(material)
		foreach mesh
			glBindVertexArray(vao) // Vertex Array (Vertex +UV +Normal Buffers)
                        glBindBuffer(ibo) // Index Buffer
                        glDrawElements(GL_TRIANGLES,vertexCount,GL_UNSIGNED_INT,(void*)0)
		end
	end
end

(Pseudo Code)

 

I have a decent graphics card (AMD Radeon R9 200 Series) which I believe should be able to handle a lot more stress than this. I've spent hours profiling with both CPU and GPU profilers, debugging, trying various optimization methods, but the bottleneck is definitely the central rendering process (Code above).

 

Is the amount of meshes really the problem here? If not, what could be causing this massive decrease in performance?

I'm not looking for culling methods, right now I'm just trying to improve my rendering pipeline.

Share this post


Link to post
Share on other sites
Advertisement

Do you have a lot of overdraw (many overlapping pixels)? Since what you described is the amount of vertices you have, when there are A LOT more pixels than vertices when rendering a mesh. You can quickly check whether fill rate (overdraw, or just slow pixel shader) is a problem by changing the window size (changes number of pixels but keeps vertices the ~same).

 

EDIT:

btw, you can associate the index buffer with the VAO (just like VBOs) if you dont specifically need to use multiple index buffers with the same VAO

and you probably should use 16-bit indices if your meshes only have ~28 verts.

Edited by Waterlimon

Share this post


Link to post
Share on other sites

It is the number of draw calls you are making. Every time you call glDrawElements you incur an overhead. You are much better off combining meshes together. Like sethhope said, you will want to combine anything static into batches. If you have a bunch of crates in your scene, for example, combine them all into a single mesh and draw that once instead of drawing them individually.

Share this post


Link to post
Share on other sites

Yeah, draw calls. Around ~1,000 is a decent maximum to aim for. ~4,000 is a rough upper limit for the older APIs.

 

You want both instancing and mesh combining (baking). Which is better is a trade-off you have to evaluate for your specific case.

 

Baking "all crates" is a bad idea, since that creates a single mesh that spans your whole level, which is really bad for culling and any dynamic bounding box system you have in place. You can bake localized clusters of objects, but then you lose out on instancing, and of course the objects must be static for baking for be possible.

 

Instancing doesn't scale forever, though, so just relying on instancing to solve everything isn't a guaranteed solution either. But for ~7,000 objects, it's almost probably what you want, assuming most of those 7,000 objects are the same mesh drawn with different transforms.

Share this post


Link to post
Share on other sites
How are you measuring time? I'm guessing that's total CPU per frame?
Add some more timing code to measure glSwapBuffers, so you can exclude it from the per/frame total. Also get some timings for how long your mesh loop takes.
You can also use ARB_timer_query to measure GPU time per frame.

If your problem is that your GPU time per frame is the bottleneck, then you'll have to optimize your shaders / data formats / overdraw / etc.
If you problem is your CPU e per frame is the bottleneck, then it's a more traditional optimization problem. Measure your CPU-side code to see where the time is going.

Share this post


Link to post
Share on other sites

What is also bad, unless this is just a test, is that your objects are so small that every 28 triangles it draws, you have to stall the GPU to figure out what is going to happen next and setup things.  You want the GPU to just draw as many triangles in one go as you can.

Share this post


Link to post
Share on other sites

Uhm am i wrong or might it just be the high amount of BufferBinds(both) ?

Would be way better to pack stuff into bigger vao's and use an offset in glDrawElements

Share this post


Link to post
Share on other sites


glBindVertexArray(vao) // Vertex Array (Vertex +UV +Normal Buffers)
Yep, thats going to be slow.

 

As Ryokeen suggested, you totally can pack meshes into a single buffer and just send an offset to the draw call. Check ARB_draw_elements_base_vertex, there are similar calls for drawing plain arrays, or instanced  arrays/elements draws.

 

That way you can just pack all your static meshes in one big buffer, managing the offsets yourself (which is fun :P) and have only a couple VAO switches. Since you're essentially doing memory management there, you need to have in mind things like memory fragmentation (ie, what happens if you pack 500 meshes then remove 200 randomly from the same buffer, things get fragmented), so beware.

 

The idea is not to use VAOs to specify "this is a single mesh that I can draw and the buffers attached have only that mesh" but more like "this is one kind of vertex format I support, and the buffers attached have tons of meshes with the same format".

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement