Sign in to follow this  

OpenGL OpenGL performance tips

Recommended Posts

Lodeman    1596

Hi all,


For my engine I need to render a bunch of different meshes, including an alpha pass.
Currently the way I do it is fairly straight-forward:


- I have one big vertex and index buffer in which I load in all my mesh data. During the render phase I use this to "instance" my geometry.

- I have frustum culling to neglect scenery that doesn't need to be drawn

- During the update part of the game-loop, I create a render-queue, which sorts the meshes so they can be drawn more efficiently


- At render-time, I bind the large mesh-buffer one time before drawing any meshes

- Drawing is done by going through the render-queue. I bind the correct texture_2D_Array for the mesh and I draw the mesh with glDrawElementsBaseVertex. Thus I just pass in the correct index to draw a certain mesh, using one and the same buffer (instancing)

- I disable the buffer at the end of the render loop

- After all opaque objects were rendered, I do the alpha pass in a similar way, also using the big buffer. Although in this case I cannot sort them mesh per mesh, since they are sorted by depth.


- I use one and the same shader-program for drawing all these meshes, and only one sampler2DArray at texture index 0. The array contains a diffuse map and an optional bumpmap.


I'm finding that with the current setup I'm not quite getting the performance I'd like to get. Therefore I'm hoping to receive some tips on how this sort of mesh-rendering problem is usually tackled by more experienced programmers. For example, is it common-practise to use just one shader-program for rendering all meshes? Or is there a much more efficient way that would remove the need to always re-bind the correct texture when switching between meshes?


Any suggestions are very welcome!


Share this post

Link to post
Share on other sites
HScottH    520

You don't give nearly enough information to stimulate a meaningful answer.   For example, "...which sorts meshes so they can be rendered more efficiently..." says little.


Pictures, polygon counts, texture sizes,  number of GL calls... these things build a basis for consideration where performance is concerned.  Given the vageuess of what you provide, I can imagine scenarios where you are bus bound, geometry bound, fill-rate bound, or ALU bound. 


I really would like to help :-)

Share this post

Link to post
Share on other sites
Lodeman    1596

You should be sorting by texture as a second criterion if shaders match (which would seem to always be your case).

In both cases, setting textures and shaders should be done only through custom wrappers that keep track of the last shader/textures set and early-out if the same is being set again.
And not just shaders and textures but every state change should be redundancy checked. Culling on/off, depth-test function, nothing should be set to the same value that it already is.


Ah yes, currently I sort per "mesh". So for example I'll have a few pinetree variants, and I'd first go over pine variant 1 and draw all those instances, then variant 2 etc... As they do share textures, it would indeed be a good move to sort per texture instead of per mesh.
The custom wrapper is also a great suggestion, will make work of that too.


A bad render queue is worse than no render queue at all. Did you time it?
Make sure you are taking advantage of per-frame temporal coherence with an insertion sort on item indices.
Do not sort actual render-queue objects and do not use std::sort().


I do suppose my current queue, based on per mesh-sorting, is inefficient. To clarify, my current renderqueue is essentially a map<int MeshID, vector<MeshInstance>>
As for how I construct it per frame, I use an octree to do frustum culling. For any mesh instance that falls in the view frustum, I check if it's mesh type is already in the render queue. If so I append the instance to the corresponding MeshInstance vector,if the Mesh type is not yet in the queue, I add a new MeshID to the map.
This queue worked well back when I was only testing instances that didn't share any textures (1 pinetree variant, 1 house, 1 bush, etc...definitely gave a performance boost as opposed to just switching between meshes randomly, I did time this), but is now outdated. So yeah, I'll look into improving this by sorting per texture.


Is your shader optimized? Are you reducing overdraw with a render-queue check on depth (following matching shaders and textures)?
Are you doing something silly such as recreating or copying over vertex buffers that are in use each frame?

I am only calling this each frame: glBindVertexArray(s_MeshBuffer.VAO);

So not recreating or copying over buffers.
As for reducing overdraw, could you elaborate on that? I'm not familiar with this.



No. Use permuations, breaking shaders reasonably between run-time branches and compile-time variants.

Could you also elaborate on this? Currently all my scenery requires the same shader code. They have the same lighting calculations, calculate an optional bumpmap (I use a uniform boolean to check if a bumpmap needs to be sampled),  sample the diffuse texture, sample shadowmaps.
I'd like to have some examples as to when one would really distinguish between using another shader program, or just having a boolean to check if a certain functionality is needed.


You don't give nearly enough information to stimulate a meaningful answer.


I'm afraid that's because I don't have sufficient OpenGL monitoring yet, I was first trying to make things "work" before sufficiently considering performance. Definitely on the todo list though. Mainly my purpose for this thread was getting general performance improvement tips.

To sketch a bit of context, this is the type of scene I'm rendering:
Polygon count for the scenery isn't anything out of the ordinary (although I can't atm give a number), texture sizes depend on the asset, but for example both the bark texture on the tree and the texture on the rocks are 512*512.
Scenery does not have LODs yet (another item on the infamous todo list), terrain however does (terrain performs decently on its own).



Thanks for the feedback so far.

Share this post

Link to post
Share on other sites
3TATUK2    714

Use mipmaps.


Realtime shadows are slow.


You can potentially batch draw calls together by sending in texture ids through a vertex attribute.


"Only" frustum culling >for *indoor* scene geometry< is not quite adequate.. Spatial culling is usually needed, PVS like in quake or umbra being the best. I use an octree with hardware occlusion.

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Similar Content

    • By povilaslt2
      Hello. I'm Programmer who is in search of 2D game project who preferably uses OpenGL and C++. You can see my projects in GitHub. Project genre doesn't matter (except MMO's :D).
    • By ZeldaFan555
      Hello, My name is Matt. I am a programmer. I mostly use Java, but can use C++ and various other languages. I'm looking for someone to partner up with for random projects, preferably using OpenGL, though I'd be open to just about anything. If you're interested you can contact me on Skype or on here, thank you!
      Skype: Mangodoor408
    • By tyhender
      Hello, my name is Mark. I'm hobby programmer. 
      So recently,I thought that it's good idea to find people to create a full 3D engine. I'm looking for people experienced in scripting 3D shaders and implementing physics into engine(game)(we are going to use the React physics engine). 
      And,ye,no money =D I'm just looking for hobbyists that will be proud of their work. If engine(or game) will have financial succes,well,then maybe =D
      Sorry for late replies.
      I mostly give more information when people PM me,but this post is REALLY short,even for me =D
      So here's few more points:
      Engine will use openGL and SDL for graphics. It will use React3D physics library for physics simulation. Engine(most probably,atleast for the first part) won't have graphical fron-end,it will be a framework . I think final engine should be enough to set up an FPS in a couple of minutes. A bit about my self:
      I've been programming for 7 years total. I learned very slowly it as "secondary interesting thing" for like 3 years, but then began to script more seriously.  My primary language is C++,which we are going to use for the engine. Yes,I did 3D graphics with physics simulation before. No, my portfolio isn't very impressive. I'm working on that No,I wasn't employed officially. If anybody need to know more PM me. 
    • By Zaphyk
      I am developing my engine using the OpenGL 3.3 compatibility profile. It runs as expected on my NVIDIA card and on my Intel Card however when I tried it on an AMD setup it ran 3 times worse than on the other setups. Could this be a AMD driver thing or is this probably a problem with my OGL code? Could a different code standard create such bad performance?
    • By Kjell Andersson
      I'm trying to get some legacy OpenGL code to run with a shader pipeline,
      The legacy code uses glVertexPointer(), glColorPointer(), glNormalPointer() and glTexCoordPointer() to supply the vertex information.
      I know that it should be using setVertexAttribPointer() etc to clearly define the layout but that is not an option right now since the legacy code can't be modified to that extent.
      I've got a version 330 vertex shader to somewhat work:
      #version 330 uniform mat4 osg_ModelViewProjectionMatrix; uniform mat4 osg_ModelViewMatrix; layout(location = 0) in vec4 Vertex; layout(location = 2) in vec4 Normal; // Velocity layout(location = 3) in vec3 TexCoord; // TODO: is this the right layout location? out VertexData { vec4 color; vec3 velocity; float size; } VertexOut; void main(void) { vec4 p0 = Vertex; vec4 p1 = Vertex + vec4(Normal.x, Normal.y, Normal.z, 0.0f); vec3 velocity = (osg_ModelViewProjectionMatrix * p1 - osg_ModelViewProjectionMatrix * p0).xyz; VertexOut.velocity = velocity; VertexOut.size = TexCoord.y; gl_Position = osg_ModelViewMatrix * Vertex; } What works is the Vertex and Normal information that the legacy C++ OpenGL code seem to provide in layout location 0 and 2. This is fine.
      What I'm not getting to work is the TexCoord information that is supplied by a glTexCoordPointer() call in C++.
      What layout location is the old standard pipeline using for glTexCoordPointer()? Or is this undefined?
      Side note: I'm trying to get an OpenSceneGraph 3.4.0 particle system to use custom vertex, geometry and fragment shaders for rendering the particles.
  • Popular Now