Advertisement Jump to content
  • Advertisement
Sign in to follow this  
  • entry
  • comments
  • views

About this blog

Game dev notes ...

Entries in this blog


Researching render loops for OpenGL 4.0+

For a hobby project I had to look at opengl render loops so here is a fast recap of the main things I pulled out of a day of reading and searching. The best hint I found was a talk from GDC 2014, linked down belowe. Also note all code here is just pseudo code, I plan to actually implement the loop in the coming days.

Tutorial Loop

Most intro tutorials will teach you a render loop like the one below, it is the most basic loop you can have to draw elements to screen and functions well to explain opengl behaviors but is useless in production code.

Pseudo code: for each (object(s)) { glBind(object.shader); glBind(object.textures); glBind(object.indexBuffer); glBind(object.vertexBuffer); glBind(object.normalBuffer); glUniform( glDraw( }
In a decent tutorial they will replace binding of buffers with a vertex buffer object at some point, removing some of the opengl calls they do in the loop above. But that is still far from optimal, to optimize this loop you have to realize that state changes in opengl are slow and need to be minimized. (Note: Almost every opengl call is a state change or requires a state to be changed while executing)

Real render loop

In production code your render loop will look more like the one down below, it groups all opengl state changes and execute the task that require the same state to be active in a sub loop. This reduces the amount of state changes dramatically and leaves you with more time to actually draw object to the screen.

A typical render loop in pseudo code: for each (shader) // Shader program glBind(shader) for each (material setup) // Program configuration: glBind(material.textures) // - Setup Texture binding glUniforms( // - Setup program uniforms for each (vertex buffer) // Vertex buffer object for each (object(s)) // Objects inside the buffer { glUniform( // Set object uniforms glDraw( // Draw the object }
It is nice and looks like the loop most opengl programmers have been using for years. But with changes in hardware and the move to concurrent programing this loop needs to change. So lets dive in and look at the changes required!

The Inner loop: for each (object(s)) { glUniform( glDraw( }
This loop interacts with the opengl driver at every iteration, these calls are not thread safe and might need to be synchronized by multi threaded the drivers. Lets improve this bit and use one or more uniform buffer in combination with a draw command buffer that allows us to submit multiple draw commands in one draw call.

Optimized: for each (object(s)) { uniforms = commands = } glUniformBuffer(uniforms) glDrawIndirect(commands)
You now have full control over the inner loop and could split it up in to multiple threads without having to worry about driver synchronization. On top of that the program only interrupt the driver once by submitting all the data in one call.

The Outer loop : for each (material setup) { glUniforms(material.config) glTextures(material.textures) ... (draw) }
Again you interact with the driver for every possible material configuration, just like the inner loop we could optimize this part out by using buffers to store the configuration and allowing the shader to access those buffers. Of course this require you to pass a index or handle along to each object that need to drawn so that the shader knows witch configuration to use for a given object.

So lets optimize again: for each (material setup) { matUniforms = material.config matTextures = material.textures } glUniformBuffer(matUniforms) glTextureArray(matTextures) ... (draw with id)
And again you get full control of the loop and removed allot of interaction with the driver. Also note that sub part of loop is no longer nested inside the material loop, giving you the option to create bigger vertex buffers when possible and further reduce the final draw call count.

Modern render loop

If you put this together you have the base for a more modern loop that is already allot better but we can still dig down deeper. Due to changes in the structure of the loop you can optimize even more.

Base: for each (shader) glBind(shader) for each (material setup) { matUniforms = material.config matTextures = material.textures } glUniformBuffer(matUniforms) glTextureArray(matTextures) for each (vertex buffer) glBind(vertex buffer) for each (object(s)) { uniforms = commands = } glUniformBuffer(uniforms) glDrawIndirect(commands)
At this point you can pull the updates to buffers out of the render loop and use an is dirty flag to update them when needed. (Note: the pseudo code is not using data oriented design to avoid branching in the loops, look that one up)

Optimized: for each (shader) glBind(shader) if (shader.isDirty) UpdateMaterialBuffer() glUniformBuffer(matUniforms) glTextureArray(matTextures) for each (vertex buffer) glBind(vertex buffer) if (vertex buffer.isDirty) UpdateObjectBuffer() glUniformBuffer(uniforms) glDrawIndirect(commands)
And last but not least, due to the new structure it now becomes possible to move the buffer binding in to the vertex buffer object so that all binds are stored on the driver side to further reduce the communication with the driver in the render loop.

Optimal: for each (shader) glBind(shader) if (shader.isDirty) UpdateMaterialBuffer() for each (vertex buffer) glBind(vertex buffer) if (vertex buffer.isDirty) UpdateObjectBuffer() glDrawIndirect(commands)
And all was good ...


The end result is a simple loop that requires almost no state changes. When using opengl v4.4 the persistent mapped buffers means no state changes are needed to update the buffers, but we do need to be careful and make sure to synchronize them when updating. (Double buffers and glSync objects will be needed especially in multi threaded environment, more on that later)

A simple fall back from opengl v4.4 is possible for v4.0 by using single indirect draw calls on the buffers. The performance penalty for this fall back is extremely high (talking 5x to 10x reduction), as such the fall back also requires you to not only lower the lod on objects but also to drop the amount objects that are being rendered. (Drop target examples: rocks, grass, plants, decorations, particle effect, ...)

That is it for now happy hunting ;)

Techniques to use:
* Texture Arrays (OpenGL v3.0)
Use texture array to pack multiple textures
* Multi draw indirect (OpenGL v4.3)
Use draw commands to pack multiple draw calls.
(Fall back: draw call for every object in a buffer v4.0)
* Persistent mapped buffers (OpenGL v4.4)
Use persistent mapped buffers to avoid having to map/unmap when ever we add, remove or change an object int the scene graph. (Fall back: use map/unmap (and/or Subdata) to update v1.2 (v 1.5))


Nemo Persona

Nemo Persona

Sign in to follow this  
  • Advertisement

Important Information

By using, you agree to our community Guidelines, Terms of Use, and Privacy Policy. is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!