You should be sorting by texture as a second criterion if shaders match (which would seem to always be your case).
In both cases, setting textures and shaders should be done only through custom wrappers that keep track of the last shader/textures set and early-out if the same is being set again.
And not just shaders and textures but every state change should be redundancy checked. Culling on/off, depth-test function, nothing should be set to the same value that it already is.
Ah yes, currently I sort per "mesh". So for example I'll have a few pinetree variants, and I'd first go over pine variant 1 and draw all those instances, then variant 2 etc... As they do share textures, it would indeed be a good move to sort per texture instead of per mesh.
The custom wrapper is also a great suggestion, will make work of that too.
A bad render queue is worse than no render queue at all. Did you time it?
Make sure you are taking advantage of per-frame temporal coherence with an insertion sort on item indices.
Do not sort actual render-queue objects and do not use std::sort().
I do suppose my current queue, based on per mesh-sorting, is inefficient. To clarify, my current renderqueue is essentially a map<int MeshID, vector<MeshInstance>>
As for how I construct it per frame, I use an octree to do frustum culling. For any mesh instance that falls in the view frustum, I check if it's mesh type is already in the render queue. If so I append the instance to the corresponding MeshInstance vector,if the Mesh type is not yet in the queue, I add a new MeshID to the map.
This queue worked well back when I was only testing instances that didn't share any textures (1 pinetree variant, 1 house, 1 bush, etc...definitely gave a performance boost as opposed to just switching between meshes randomly, I did time this), but is now outdated. So yeah, I'll look into improving this by sorting per texture.
Is your shader optimized? Are you reducing overdraw with a render-queue check on depth (following matching shaders and textures)?
Are you doing something silly such as recreating or copying over vertex buffers that are in use each frame?
I am only calling this each frame: glBindVertexArray(s_MeshBuffer.VAO);
So not recreating or copying over buffers.
As for reducing overdraw, could you elaborate on that? I'm not familiar with this.
No. Use permuations, breaking shaders reasonably between run-time branches and compile-time variants.
Could you also elaborate on this? Currently all my scenery requires the same shader code. They have the same lighting calculations, calculate an optional bumpmap (I use a uniform boolean to check if a bumpmap needs to be sampled), sample the diffuse texture, sample shadowmaps.
I'd like to have some examples as to when one would really distinguish between using another shader program, or just having a boolean to check if a certain functionality is needed.
You don't give nearly enough information to stimulate a meaningful answer.
I'm afraid that's because I don't have sufficient OpenGL monitoring yet, I was first trying to make things "work" before sufficiently considering performance. Definitely on the todo list though. Mainly my purpose for this thread was getting general performance improvement tips.
To sketch a bit of context, this is the type of scene I'm rendering:
Polygon count for the scenery isn't anything out of the ordinary (although I can't atm give a number), texture sizes depend on the asset, but for example both the bark texture on the tree and the texture on the rocks are 512*512.
Scenery does not have LODs yet (another item on the infamous todo list), terrain however does (terrain performs decently on its own).
Thanks for the feedback so far.
Cheers!