Overhead with gl* Calls

Started by
4 comments, last by Stainless 9 years, 8 months ago

This question regards OpenGL in general, but I think it's more specific to the mobile side of things were efficiency is especially important. Over the years, I've developed my OpenGL ES games with the mindset that making any sort of gl* call will almost-always query the GPU as mobile implementations of OpenGL are completely hardware-accelerated, like modern implementations of OpenGL. That being said, any thread making a gl* call will be halted while the CPU queries the GPU, waiting for a response. Is this correct so far?

I understand that cutting down on glEnable/glDisable calls by calling by sorting renderable elements with similar states, and also wrapping those two in my own state manager is important. I also do the same when binding buffers and textures. You could call SetTexture(GL_TEXTURE_2D, 0, &tex->name) to bind to the first texture unit in 2D. If that particular texture name, for that target at that texture unit has already been bound, then it won't do it again. This would come in handy when rendering multiple instances of the same model multiple times because it'd call glBindTexture(), glBindBuffer(), etc once for the first model, but all subsequent calls wouldn't because they're all using the same texture/buffer parameters that's common to the loaded model they share. Same for checking shaders. It's pretty common that multiple models might use the same shader in a scene. Image rendering dozens of individual models to the screen, but only having to call glUseProgram() twice each frame instead of once per instance rendered. I mean, since I'm still using OpenGL 2.1 (OpenGL ES 2.0 for mobile), glDrawElements() is called once per mesh per instance of the model drawn. For example, drawing 12 instances of a model with 5 meshes would be 60 draw calls. This could be heavy on mobile until I learn about instancing in higher versions of OpenGL and support OpenGL ES 3.0 on mobile.

My question is: is my managing OpenGL contexts internally in my engine worthwhile? Is it a huge performance hit to call glBindTexture() constantly (especially on mobile), or do OpenGL implementations usually check this already. Should I just focus on keeping draw calls down, or is the way I'm managing my states pretty important too?

From what I've read about OpenGL 4.5, it's going through a significant rewrite to be closer to vender-specific implementations such as AMD's Mantle, NVIDIA's CUDA and even iOS's soon-to-be Metal API (ok, so that one's OS-specific working on providing efficient OpenGL drivers under-the-hood) so we could set a texture at a specified target at a specific active texture unit in one function call instead of 2.

Advertisement


That being said, any thread making a gl* call will be halted while the CPU queries the GPU, waiting for a response. Is this correct so far?
No. Most gl calls will just do CPU work in a driver and not communicate with the GPU at all.

The GPU usually lags behind the CPU by about a whole frame (or more), so GPU->CPU data readback is terrible for performance (can instantly halve your framerate). glGet* functions are the scary ones that can cause this kind of thing.

Most gl functions are just setting a small amount of data inside the driver, doing some error checking on the arguments, and setting a dirty flag.

The glDraw* functions then check all of the dirty flags, and generate any required actual native GPU commands (bind this texture, bind this shader, draw these triangles...), telling the GPU how to draw things. This is why draw-calls are expensive on the CPU-side; the driver has to do a lot of work inside the glDraw* functions to figure out what commands need to be written into the command buffer.

These commands aren't sent to the GPU synchronously -- instead they're written into a "command buffer". The GPU asynchronously reads commands from this buffer and executes them, but like I said above, the GPU will usually have about a whole frame's worth of commands buffered up at once, so there's a big delay between the CPU writing commands and the GPU executing them.

Ok, so that being said, is it ok to make common, repetitive calls to glEnable(), glDisable(), glUseProgram(), glBindTexture(), etc with the same parameter values, or should I continue to to provide extra logic to reduce the amount of gl* calls being made. I never use glGet* calls unless it's glGetUniformLocation(), and that's just once when my shader is successfully compiled, and loaded.

Apple's docs have stated that it's important to provide our own state machines for GL states in the past, but now I'm starting to think it's meant only to be an alternative to constantly querying the GPU for what states are enabled. It's been years since I've read that, anyway... I learned earlier this year that GL_TEXTURE_2D not longer needs to be called in OpenGL ES 2.0, which I always assumed was necessary as I came from using OpenGL ES 1.1.

I think it's considered good practice to remove unnecessary gl calls by shadowing the state on the application side. Certainly Apple's tools (OpenGLES analyser for instance) explicitly warn you about each and every redundant state change you make, so while we can't know what exactly their driver is doing, it'd be reasonable to assume that each redundant state change you make is causing the driver to actually add extra stuff into the command buffer.


Apple's docs have stated that it's important to provide our own state machines for GL states

I'm no expert when it comes to OpenGL, but I think they recommend this mostly because of context resets (from https://www.khronos.org/registry/gles/extensions/EXT/EXT_robustness.txt):

If the reset notification behavior is NO_RESET_NOTIFICATION_EXT,
then the implementation will never deliver notification of reset
events, and GetGraphicsResetStatusEXT will always return
NO_ERROR[fn1].
[fn1: In this case it is recommended that implementations should
not allow loss of context state no matter what events occur.
However, this is only a recommendation, and cannot be relied
upon by applications.]

I have a had issues in the past where enable and disable calls had a significant effect on performance.

You have to remember when working in the mobile world that not all devices are created equal.

Even devices with the same exact chipset will probably have a different software stack, and hence different performance.

A classic case is the nightmare of compiling shaders on mobile devices. I have had a case with two devices with the same GPU (Mali) and very similar hardware, one compiled the shader into 317 instructions. The other failed to compile the shader at all as the instruction limit went over 512.

Doing things in the best possible way from day one can really help you down the line. Honestly, it may be boring and a pain in posterior but it is worth all the effort when the game "just runs" on every device you test it on.

There is nothing worse than sitting there trying to figure out why the game crashes on a device that you don't own smile.png

This topic is closed to new replies.

Advertisement