Jump to content
  • Advertisement
Sign in to follow this  
  • entry
    1
  • comments
    2
  • views
    2293

Researching render loops for OpenGL 4.0+

Sign in to follow this  
Nemo Persona

2306 views

For a hobby project I had to look at opengl render loops so here is a fast recap of the main things I pulled out of a day of reading and searching. The best hint I found was a talk from GDC 2014, linked down belowe. Also note all code here is just pseudo code, I plan to actually implement the loop in the coming days.

Tutorial Loop



Most intro tutorials will teach you a render loop like the one below, it is the most basic loop you can have to draw elements to screen and functions well to explain opengl behaviors but is useless in production code.

Pseudo code: for each (object(s)) { glBind(object.shader); glBind(object.textures); glBind(object.indexBuffer); glBind(object.vertexBuffer); glBind(object.normalBuffer); glUniform(object.info) glDraw(object.data) }
In a decent tutorial they will replace binding of buffers with a vertex buffer object at some point, removing some of the opengl calls they do in the loop above. But that is still far from optimal, to optimize this loop you have to realize that state changes in opengl are slow and need to be minimized. (Note: Almost every opengl call is a state change or requires a state to be changed while executing)


Real render loop



In production code your render loop will look more like the one down below, it groups all opengl state changes and execute the task that require the same state to be active in a sub loop. This reduces the amount of state changes dramatically and leaves you with more time to actually draw object to the screen.

A typical render loop in pseudo code: for each (shader) // Shader program glBind(shader) for each (material setup) // Program configuration: glBind(material.textures) // - Setup Texture binding glUniforms(material.info) // - Setup program uniforms for each (vertex buffer) // Vertex buffer object for each (object(s)) // Objects inside the buffer { glUniform(object.info) // Set object uniforms glDraw(object.data) // Draw the object }
It is nice and looks like the loop most opengl programmers have been using for years. But with changes in hardware and the move to concurrent programing this loop needs to change. So lets dive in and look at the changes required!

The Inner loop:

for each (object(s)) { glUniform(object.info) glDraw(object.data) }
This loop interacts with the opengl driver at every iteration, these calls are not thread safe and might need to be synchronized by multi threaded the drivers. Lets improve this bit and use one or more uniform buffer in combination with a draw command buffer that allows us to submit multiple draw commands in one draw call.

Optimized: for each (object(s)) { uniforms = object.info commands = object.data } glUniformBuffer(uniforms) glDrawIndirect(commands)
You now have full control over the inner loop and could split it up in to multiple threads without having to worry about driver synchronization. On top of that the program only interrupt the driver once by submitting all the data in one call.

The Outer loop

: for each (material setup) { glUniforms(material.config) glTextures(material.textures) ... (draw) }
Again you interact with the driver for every possible material configuration, just like the inner loop we could optimize this part out by using buffers to store the configuration and allowing the shader to access those buffers. Of course this require you to pass a index or handle along to each object that need to drawn so that the shader knows witch configuration to use for a given object.

So lets optimize again: for each (material setup) { matUniforms = material.config matTextures = material.textures } glUniformBuffer(matUniforms) glTextureArray(matTextures) ... (draw with id)
And again you get full control of the loop and removed allot of interaction with the driver. Also note that sub part of loop is no longer nested inside the material loop, giving you the option to create bigger vertex buffers when possible and further reduce the final draw call count.

Modern render loop



If you put this together you have the base for a more modern loop that is already allot better but we can still dig down deeper. Due to changes in the structure of the loop you can optimize even more.

Base: for each (shader) glBind(shader) for each (material setup) { matUniforms = material.config matTextures = material.textures } glUniformBuffer(matUniforms) glTextureArray(matTextures) for each (vertex buffer) glBind(vertex buffer) for each (object(s)) { uniforms = object.info commands = object.data } glUniformBuffer(uniforms) glDrawIndirect(commands)
At this point you can pull the updates to buffers out of the render loop and use an is dirty flag to update them when needed. (Note: the pseudo code is not using data oriented design to avoid branching in the loops, look that one up)

Optimized: for each (shader) glBind(shader) if (shader.isDirty) UpdateMaterialBuffer() glUniformBuffer(matUniforms) glTextureArray(matTextures) for each (vertex buffer) glBind(vertex buffer) if (vertex buffer.isDirty) UpdateObjectBuffer() glUniformBuffer(uniforms) glDrawIndirect(commands)
And last but not least, due to the new structure it now becomes possible to move the buffer binding in to the vertex buffer object so that all binds are stored on the driver side to further reduce the communication with the driver in the render loop.

Optimal: for each (shader) glBind(shader) if (shader.isDirty) UpdateMaterialBuffer() for each (vertex buffer) glBind(vertex buffer) if (vertex buffer.isDirty) UpdateObjectBuffer() glDrawIndirect(commands)
And all was good ...


Conclusion



The end result is a simple loop that requires almost no state changes. When using opengl v4.4 the persistent mapped buffers means no state changes are needed to update the buffers, but we do need to be careful and make sure to synchronize them when updating. (Double buffers and glSync objects will be needed especially in multi threaded environment, more on that later)

A simple fall back from opengl v4.4 is possible for v4.0 by using single indirect draw calls on the buffers. The performance penalty for this fall back is extremely high (talking 5x to 10x reduction), as such the fall back also requires you to not only lower the lod on objects but also to drop the amount objects that are being rendered. (Drop target examples: rocks, grass, plants, decorations, particle effect, ...)

That is it for now happy hunting ;)

Techniques to use:
* Texture Arrays (OpenGL v3.0)
Use texture array to pack multiple textures
* Multi draw indirect (OpenGL v4.3)
Use draw commands to pack multiple draw calls.
(Fall back: draw call for every object in a buffer v4.0)
* Persistent mapped buffers (OpenGL v4.4)
Use persistent mapped buffers to avoid having to map/unmap when ever we add, remove or change an object int the scene graph. (Fall back: use map/unmap (and/or Subdata) to update v1.2 (v 1.5))

Ref:
http://gdcvault.com/play/1020791
http://www.slideshare.net/CassEveritt/approaching-zero-driver-overhead
https://www.khronos.org/assets/uploads/developers/library/2014-gdc/Khronos-OpenGL-Efficiency-GDC-Mar14.pdf

https://www.opengl.org/wiki/GLAPI/glDrawArraysIndirect
https://www.opengl.org/wiki/GLAPI/glDrawElementsIndirect
https://www.opengl.org/wiki/GLAPI/glMultiDrawArraysIndirect
https://www.opengl.org/wiki/GLAPI/glMultiDrawElementsIndirect
Sign in to follow this  


2 Comments


Recommended Comments

And last but not least, due to the new structure it now becomes possible to move the buffer binding in to the vertex buffer object so that all binds are stored on the driver side to further reduce the communication with the driver in the render loop.
I don't follow, you're putting constant data in the same buffer as the geometry data?

 

And also, don't all textures in texture arrays need to be of the same size?

Share this comment


Link to comment

No they have there own buffer and the shaders get read access to those (as a uniform buffer object), after that you pass an id to the material and/or textures to use for every command in the command buffer. the id's are stored in buffers as well and passed along as vertex attributes to shaders (use divisor so they only change per command and not per vertex). All those buffers can be bound in the vertex array object so you don't need to bind/unbind in sub loops.

 

The texture size needs to be the same in a default texture array, this can be a problem in some case (not for me atm), in theory this can be solved using spare textures. (using bind-less textures, might need to expand on that and ref the info on them)

 

Proof of concept still in the pipeline, if I find the time to finish it.

Share this comment


Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!