Jump to content
  • Advertisement
Sign in to follow this  

OpenGL OpenGL performance tips

This topic is 2048 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi all,


For my engine I need to render a bunch of different meshes, including an alpha pass.
Currently the way I do it is fairly straight-forward:


- I have one big vertex and index buffer in which I load in all my mesh data. During the render phase I use this to "instance" my geometry.

- I have frustum culling to neglect scenery that doesn't need to be drawn

- During the update part of the game-loop, I create a render-queue, which sorts the meshes so they can be drawn more efficiently


- At render-time, I bind the large mesh-buffer one time before drawing any meshes

- Drawing is done by going through the render-queue. I bind the correct texture_2D_Array for the mesh and I draw the mesh with glDrawElementsBaseVertex. Thus I just pass in the correct index to draw a certain mesh, using one and the same buffer (instancing)

- I disable the buffer at the end of the render loop

- After all opaque objects were rendered, I do the alpha pass in a similar way, also using the big buffer. Although in this case I cannot sort them mesh per mesh, since they are sorted by depth.


- I use one and the same shader-program for drawing all these meshes, and only one sampler2DArray at texture index 0. The array contains a diffuse map and an optional bumpmap.


I'm finding that with the current setup I'm not quite getting the performance I'd like to get. Therefore I'm hoping to receive some tips on how this sort of mesh-rendering problem is usually tackled by more experienced programmers. For example, is it common-practise to use just one shader-program for rendering all meshes? Or is there a much more efficient way that would remove the need to always re-bind the correct texture when switching between meshes?


Any suggestions are very welcome!


Share this post

Link to post
Share on other sites

You don't give nearly enough information to stimulate a meaningful answer.   For example, "...which sorts meshes so they can be rendered more efficiently..." says little.


Pictures, polygon counts, texture sizes,  number of GL calls... these things build a basis for consideration where performance is concerned.  Given the vageuess of what you provide, I can imagine scenarios where you are bus bound, geometry bound, fill-rate bound, or ALU bound. 


I really would like to help :-)

Share this post

Link to post
Share on other sites

You should be sorting by texture as a second criterion if shaders match (which would seem to always be your case).

In both cases, setting textures and shaders should be done only through custom wrappers that keep track of the last shader/textures set and early-out if the same is being set again.
And not just shaders and textures but every state change should be redundancy checked. Culling on/off, depth-test function, nothing should be set to the same value that it already is.


Ah yes, currently I sort per "mesh". So for example I'll have a few pinetree variants, and I'd first go over pine variant 1 and draw all those instances, then variant 2 etc... As they do share textures, it would indeed be a good move to sort per texture instead of per mesh.
The custom wrapper is also a great suggestion, will make work of that too.


A bad render queue is worse than no render queue at all. Did you time it?
Make sure you are taking advantage of per-frame temporal coherence with an insertion sort on item indices.
Do not sort actual render-queue objects and do not use std::sort().


I do suppose my current queue, based on per mesh-sorting, is inefficient. To clarify, my current renderqueue is essentially a map<int MeshID, vector<MeshInstance>>
As for how I construct it per frame, I use an octree to do frustum culling. For any mesh instance that falls in the view frustum, I check if it's mesh type is already in the render queue. If so I append the instance to the corresponding MeshInstance vector,if the Mesh type is not yet in the queue, I add a new MeshID to the map.
This queue worked well back when I was only testing instances that didn't share any textures (1 pinetree variant, 1 house, 1 bush, etc...definitely gave a performance boost as opposed to just switching between meshes randomly, I did time this), but is now outdated. So yeah, I'll look into improving this by sorting per texture.


Is your shader optimized? Are you reducing overdraw with a render-queue check on depth (following matching shaders and textures)?
Are you doing something silly such as recreating or copying over vertex buffers that are in use each frame?

I am only calling this each frame: glBindVertexArray(s_MeshBuffer.VAO);

So not recreating or copying over buffers.
As for reducing overdraw, could you elaborate on that? I'm not familiar with this.



No. Use permuations, breaking shaders reasonably between run-time branches and compile-time variants.

Could you also elaborate on this? Currently all my scenery requires the same shader code. They have the same lighting calculations, calculate an optional bumpmap (I use a uniform boolean to check if a bumpmap needs to be sampled),  sample the diffuse texture, sample shadowmaps.
I'd like to have some examples as to when one would really distinguish between using another shader program, or just having a boolean to check if a certain functionality is needed.


You don't give nearly enough information to stimulate a meaningful answer.


I'm afraid that's because I don't have sufficient OpenGL monitoring yet, I was first trying to make things "work" before sufficiently considering performance. Definitely on the todo list though. Mainly my purpose for this thread was getting general performance improvement tips.

To sketch a bit of context, this is the type of scene I'm rendering:
Polygon count for the scenery isn't anything out of the ordinary (although I can't atm give a number), texture sizes depend on the asset, but for example both the bark texture on the tree and the texture on the rocks are 512*512.
Scenery does not have LODs yet (another item on the infamous todo list), terrain however does (terrain performs decently on its own).



Thanks for the feedback so far.

Share this post

Link to post
Share on other sites

Use mipmaps.


Realtime shadows are slow.


You can potentially batch draw calls together by sending in texture ids through a vertex attribute.


"Only" frustum culling >for *indoor* scene geometry< is not quite adequate.. Spatial culling is usually needed, PVS like in quake or umbra being the best. I use an octree with hardware occlusion.

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!