Bad performance when rendering medium amount of meshes

Started by
24 comments, last by JohnnyCode 8 years, 6 months ago

I'm getting massive performance loss when rendering a medium-sized scene with a reasonable(?) amount of meshes.

The stats for all rendered (non-occluded) objects in the scene are:


Triangles: 204432
Vertices: 68449
Shader Changes (glUseProgram): 3
Material Changes (glBindTexture): 239
Meshes: 7153
Render Duration: 23ms (~43fps)

I know that rendering a lot of low-poly meshes is a lot more expensive than rendering a handful of high-poly meshes, but still, 7153 meshes with an average of ~28 triangles doesn't seem like a big deal to me, and yet the performance goes down the drain.

Before rendering, all of my meshes are first sorted by shader, then by material. The main render process is as follows:


foreach shader
	glUseProgram(shader)
	foreach material
                glBindTexture(material)
		foreach mesh
			glBindVertexArray(vao) // Vertex Array (Vertex +UV +Normal Buffers)
                        glBindBuffer(ibo) // Index Buffer
                        glDrawElements(GL_TRIANGLES,vertexCount,GL_UNSIGNED_INT,(void*)0)
		end
	end
end

(Pseudo Code)

I have a decent graphics card (AMD Radeon R9 200 Series) which I believe should be able to handle a lot more stress than this. I've spent hours profiling with both CPU and GPU profilers, debugging, trying various optimization methods, but the bottleneck is definitely the central rendering process (Code above).

Is the amount of meshes really the problem here? If not, what could be causing this massive decrease in performance?

I'm not looking for culling methods, right now I'm just trying to improve my rendering pipeline.

Advertisement

When I was working on performance with alot of meshs, I found it very profitable to 'flag' which of my meshs was static, and then, on runtime, combine all static meshs into one large mesh object and then render that. It worked rather well because it was iterating through a small number of meshs (the vertex count was high for each mesh, however)

I develop to expand the universe. "Live long and code strong!" - Delta_Echo (dream.in.code)

Do you have a lot of overdraw (many overlapping pixels)? Since what you described is the amount of vertices you have, when there are A LOT more pixels than vertices when rendering a mesh. You can quickly check whether fill rate (overdraw, or just slow pixel shader) is a problem by changing the window size (changes number of pixels but keeps vertices the ~same).

EDIT:

btw, you can associate the index buffer with the VAO (just like VBOs) if you dont specifically need to use multiple index buffers with the same VAO

and you probably should use 16-bit indices if your meshes only have ~28 verts.

o3o

It is the number of draw calls you are making. Every time you call glDrawElements you incur an overhead. You are much better off combining meshes together. Like sethhope said, you will want to combine anything static into batches. If you have a bunch of crates in your scene, for example, combine them all into a single mesh and draw that once instead of drawing them individually.

My current game project Platform RPG

Yeah, draw calls. Around ~1,000 is a decent maximum to aim for. ~4,000 is a rough upper limit for the older APIs.

You want both instancing and mesh combining (baking). Which is better is a trade-off you have to evaluate for your specific case.

Baking "all crates" is a bad idea, since that creates a single mesh that spans your whole level, which is really bad for culling and any dynamic bounding box system you have in place. You can bake localized clusters of objects, but then you lose out on instancing, and of course the objects must be static for baking for be possible.

Instancing doesn't scale forever, though, so just relying on instancing to solve everything isn't a guaranteed solution either. But for ~7,000 objects, it's almost probably what you want, assuming most of those 7,000 objects are the same mesh drawn with different transforms.

Sean Middleditch – Game Systems Engineer – Join my team!

How are you measuring time? I'm guessing that's total CPU per frame?
Add some more timing code to measure glSwapBuffers, so you can exclude it from the per/frame total. Also get some timings for how long your mesh loop takes.
You can also use ARB_timer_query to measure GPU time per frame.

If your problem is that your GPU time per frame is the bottleneck, then you'll have to optimize your shaders / data formats / overdraw / etc.
If you problem is your CPU e per frame is the bottleneck, then it's a more traditional optimization problem. Measure your CPU-side code to see where the time is going.

What is also bad, unless this is just a test, is that your objects are so small that every 28 triangles it draws, you have to stall the GPU to figure out what is going to happen next and setup things. You want the GPU to just draw as many triangles in one go as you can.

NBA2K, Madden, Maneater, Killing Floor, Sims http://www.pawlowskipinball.com/pinballeternal


Material Changes (glBindTexture): 239

This is extreme, post your gpu first, so we can tell wheather you have the performance issue or not at all.

Uhm am i wrong or might it just be the high amount of BufferBinds(both) ?

Would be way better to pack stuff into bigger vao's and use an offset in glDrawElements


glBindVertexArray(vao) // Vertex Array (Vertex +UV +Normal Buffers)
Yep, thats going to be slow.

As Ryokeen suggested, you totally can pack meshes into a single buffer and just send an offset to the draw call. Check ARB_draw_elements_base_vertex, there are similar calls for drawing plain arrays, or instanced arrays/elements draws.

That way you can just pack all your static meshes in one big buffer, managing the offsets yourself (which is fun :P) and have only a couple VAO switches. Since you're essentially doing memory management there, you need to have in mind things like memory fragmentation (ie, what happens if you pack 500 meshes then remove 200 randomly from the same buffer, things get fragmented), so beware.

The idea is not to use VAOs to specify "this is a single mesh that I can draw and the buffers attached have only that mesh" but more like "this is one kind of vertex format I support, and the buffers attached have tons of meshes with the same format".

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

This topic is closed to new replies.

Advertisement