Sign in to follow this  
Heelp

OpenGL Which per-frame operations are expensive in terms of performance in OpenGL?

Recommended Posts

Ok guys, I already made the camera and hero movement and I loaded one small map consisting of 100 cubes for my mmorpg. I made some ugly models using blender, and I made a function that reads the vertices from .obj file and stores everything in GLfloat array. The next very important thing I need to know is: What kinds of operations should I try to keep to a minimum in my game?
What I gathered from previous posts is:

 

1. Don't make separate VAO and VBOs for every object, because binding and redefining a thousand VBOs every frame takes time.

2. Don't make too much draw calls because that takes time, too.

3. Calculate projection*view*model matrix outside the vertex shader, so you don't calculate the same stuff over and over.

4. Use spheres for collision whenever you can, because it's the simplest possible bounding object.

 

If anything more comes to mind, write it here, please, even if it's not so beginner-friendly, because I can return to this post at some point in the future.

 

And one more thing: I have a question about that 1st principle I just mentioned. Now that I have 7 or 8 models, what I need is a good, logical, performance-wise way of storing them in memory and drawing them on screen. How should I know when is the right time to make another VAO or VBO. Basically I know that VAO is a list of attributes for a number of objects, and it consists of several VBOs, and one VBO can contain vertex coordinates or textures or normals or stuff like that. So, for example, if I have 7-8 models with their own vertex coordinates, should I keep all their vertex coordinates in only one VBO or should I do something else?

Edited by Heelp

Share this post


Link to post
Share on other sites

You are asking a problem that may not be necessary for many years to come for you. A lot of people ask this and I tell everyone, optimize when you need to. Hardware is extremely fast nowadays that this shouldn't be a concern. Focus on your game if that is what you are building. If you just want to build the best tech in the world, then that is a different story.

Share this post


Link to post
Share on other sites

This kind of thing is actually micro-optimizing with the kind of workload you're doing.

 

The worst operations for performance are and always will be: (1) reading anything back from the GPU, and (2) updating a resource that's in use.  Avoid those, and so long as the rest of your code isn't fighting the API you'll do OK.

Share this post


Link to post
Share on other sites

I'm directx not opengl but my understanding in regards to playing nice with the API they are similar.

 

1. Don't draw what you don't need to.  So frustum and if possible occlusion culling.

2. draw front to back to take advantage of hi-z and early-z.

3. sort by state to reduce state changes between draw calls and use texture atlus's to furture reduce state changes.

4. I agree with your 1,2, and 3 except I would say only make as draw calls as you need to and minimize draw calls using various techniques like instancing.

5. mhagain advice is good as well.

6. figure out the best way to upload uniforms to the gpu. (I remember a couple of threads in this forum where there were performance implications)

Share this post


Link to post
Share on other sites

1. Don't make separate VAO and VBOs for every object, because binding and redefining a thousand VBOs every frame takes time.
2. Don't make too much draw calls because that takes time, too.
3. Calculate projection*view*model matrix outside the vertex shader, so you don't calculate the same stuff over and over.

1. Yes, if you've got different VBO's per object, you'll have to make extra GL function calls to rebind VBO's inbetween rendering each object. Whether or not this is a problem depends on your game. If you're trying to draw 10's of thousands of unique objects, it's probably an issue. If you're trying to draw one thousand, not as much.

2. Basically, any GL function takes up CPU time. If you design your renderer so that you'll make the least GL calls, then you'll save CPU time. However, in order to do this, sometimes you have to do things that are inefficient for the GPU... Depending on your game, it may be more important to optimize for the CPU or optimize for the GPU.

3. As above, if your game has spare CPU time and is over budget on the GPU, then this could be a good idea. However, if the opposite is true (spare GPU time, over CPU budget), then it may be harmful :P

Also, if you calculate projection*view*model on the CPU, then you need to repeatedly set a uniform value for every single object in the scene. On the other hand, if you put projection*view in one UBO and model in (many) others, then you only need to set projection*view once per frame instead of once per object. If your scene is full of static objects (so model doesn't need to be set every frame), this could be a significant reduction in GL calls...

Share this post


Link to post
Share on other sites

1. Yes, if you've got different VBO's per object, you'll have to make extra GL function calls to rebind VBO's inbetween rendering each object. Whether or not this is a problem depends on your game. If you're trying to draw 10's of thousands of unique objects, it's probably an issue. If you're trying to draw one thousand, not as much.
2. Basically, any GL function takes up CPU time. If you design your renderer so that you'll make the least GL calls, then you'll save CPU time. However, in order to do this, sometimes you have to do things that are inefficient for the GPU... Depending on your game, it may be more important to optimize for the CPU or optimize for the GPU.
3. As above, if your game has spare CPU time and is over budget on the GPU, then this could be a good idea. However, if the opposite is true (spare GPU time, over CPU budget), then it may be harmful :P
Also, if you calculate projection*view*model on the CPU, then you need to repeatedly set a uniform value for every single object in the scene. On the other hand, if you put projection*view in one UBO and model in (many) others, then you only need to set projection*view once per frame instead of once per object. If your scene is full of static objects (so model doesn't need to be set every frame), this could be a significant reduction in GL calls...


These are great examples because they also illustrate some other factors to be aware (and beware) of.

 

The main thing that jumps out here is performance vs code complexity vs flexibility.  The example of putting all objects into a single VBO is a great use case for this.  On the one hand you get to reduce VBO binds, but on the other hand your model loading code becomes more complex because you've now got to record offsets for each object (as well as handle the case where a VBO you allocate up-front may not be large enough).  Then you lose the flexibility to load and unload models on the fly.

 

Depending on your program any or all of these may be a deal-breaker.  So don't let the quest for the absolute theoretical highest performance overrule other goals.

 

The example of matrices is another great one, this time illustrating how an optimization in one place can lead to loss of performance in another.  That's often the case with optimizations: you're making a trade-off and hoping that you come out with a net positive.  In this case setting projection * view one-time-only certainly looks attractive, but you end up trading that off against having to make an extra matrix multiplication per-vertex.  That's one of the reasons why we always say "benchmark", because unless you have measurements you'll never know if the work you did to optimize one area didn't come at a greater cost elsewhere.

Share this post


Link to post
Share on other sites
Also, if you calculate 

 

projection*view*model on the CPU, then you need to repeatedly set a uniform value for every single object in the scene. On the other hand, if you put projection*viewin one UBO and model in (many) others, then you only need to set projection*view once per frame instead of once per object. If your scene is full of static objects (so model doesn't need to be set every frame), this could be a significant reduction in GL calls...

 

 

Question. Im a stupid person, maybe I don't understand something, but follow my logic here. If I have 2 million vertices, then the vertex shader runs 2 million times. Why would I want to calculate projection*view 2 million times per frame, when I can do it only once per frame in the cpu? What tradeoff??? :blink: 

Edited by Heelp

Share this post


Link to post
Share on other sites

Question. Im a stupid person, maybe I don't understand something, but follow my logic here. If I have 2 million vertices, then the vertex shader runs 2 million times. Why would I want to calculate projection*view 2 million times per frame, when I can do it only once per frame in the cpu? What tradeoff??? :blink: 

 

 

 

Also, if you calculate projection*view*model on the CPU, then you need to repeatedly set a uniform value for every single object in the scene. On the other hand, if you put projection*view in one UBO and model in (many) others, then you only need to set projection*view once per frame instead of once per object. If your scene is full of static objects (so model doesn't need to be set every frame), this could be a significant reduction in GL calls...

 

He explains it with the above.

Also GPU's are ALU heavy and they can tear through calculations.

Share this post


Link to post
Share on other sites

Why do I need to repeatedly set a uniform value, uniform value is only set up once, and it works for all the vertices, that's why I had declared it to be uniform in the first place.

Share this post


Link to post
Share on other sites

Question. Im a stupid person, maybe I don't understand something, but follow my logic here. If I have 2 million vertices, then the vertex shader runs 2 million times. Why would I want to calculate projection*view 2 million times per frame, when I can do it only once per frame in the cpu? What tradeoff??? :blink:

The CPU and GPU are completely independent parallel processors running on their own timelines.
If game A takes 15ms of CPU time per frame and 4ms of GPU time per frame, then your optimization choices will be different to game B that takes 8ms of CPU time per frame and 30ms of GPU time per frame.
Whichever of the two processors is taking more time per frame, we say that that processor is bottlenecking the game.
Game A is bottlenecked by the CPU frametime. We can add another 11ms of per-frame work to the GPU and this game's framerate won't change at all!
Game B is bottlenecked by the GPU frametime. We can add another 22ms of per-frame work to the CPU and this game's framerate won't change at all!

When optimizing Game A, I'd try to reduce the CPU workload, even if that means increasing the GPU workload.

So, given the example you quoted:
#1
Once per object (CPU):
Update UBOs with every static object's world matrix:
Per frame per camera (CPU):
Update UBO with view*projection matrix:
Per vertex (GPU):
Calculate mul( view_projection, mul( world, position ) );

#2
Per frame per camera per object (CPU):
Update UBOs with world*view*projection matrix:
Per vertex (GPU):
Calculate mul( world_view_projection, position );

#1 has a reduced CPU workload, but increased GPU workload.
#2 has an increased CPU workload, but reduced GPU workload.

Yes, #2 is more efficient in general (assuming you've got a large number of vertices), because the per-vertex part happens a large number of times... but generalizations aren't useful when optimizing specific games.

Also, a "position * matrix" operation takes something like 4 clock cycles. A low-spec PC GPU has about 1280 "shader cores" running at 1GHz, so 2M vertices at 4 clock cycles each, divided by 1280 parallel operations is 6250 clock cycles, which takes 6.25 microseconds. So then you've got to measure the CPU overhead of performing the per-object world-view-proj calculations and UBO updates, and then weigh up whether 6?s of GPU time is more important than whatever the CPU time cost is.

Furthermore, GPU's are usually bottlenecked by memory access times. Often the GPU's ALU units are sitting idle, waiting for data to arrive from memory before they can compute anything. In many situations, you may be able to add extra ALU instructions to a shader without increasing it's runtime whatsoever! So in some cases, this extra matrix multiply could work out to be free.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Announcements

  • Forum Statistics

    • Total Topics
      628394
    • Total Posts
      2982427
  • Similar Content

    • By test opty
      Hi all,
       
      I'm starting OpenGL using a tut on the Web. But at this point I would like to know the primitives needed for creating a window using OpenGL. So on Windows and using MS VS 2017, what is the simplest code required to render a window with the title of "First Rectangle", please?
       
       
    • By DejayHextrix
      Hi, New here. 
      I need some help. My fiance and I like to play this mobile game online that goes by real time. Her and I are always working but when we have free time we like to play this game. We don't always got time throughout the day to Queue Buildings, troops, Upgrades....etc.... 
      I was told to look into DLL Injection and OpenGL/DirectX Hooking. Is this true? Is this what I need to learn? 
      How do I read the Android files, or modify the files, or get the in-game tags/variables for the game I want? 
      Any assistance on this would be most appreciated. I been everywhere and seems no one knows or is to lazy to help me out. It would be nice to have assistance for once. I don't know what I need to learn. 
      So links of topics I need to learn within the comment section would be SOOOOO.....Helpful. Anything to just get me started. 
      Thanks, 
      Dejay Hextrix 
    • By mellinoe
      Hi all,
      First time poster here, although I've been reading posts here for quite a while. This place has been invaluable for learning graphics programming -- thanks for a great resource!
      Right now, I'm working on a graphics abstraction layer for .NET which supports D3D11, Vulkan, and OpenGL at the moment. I have implemented most of my planned features already, and things are working well. Some remaining features that I am planning are Compute Shaders, and some flavor of read-write shader resources. At the moment, my shaders can just get simple read-only access to a uniform (or constant) buffer, a texture, or a sampler. Unfortunately, I'm having a tough time grasping the distinctions between all of the different kinds of read-write resources that are available. In D3D alone, there seem to be 5 or 6 different kinds of resources with similar but different characteristics. On top of that, I get the impression that some of them are more or less "obsoleted" by the newer kinds, and don't have much of a place in modern code. There seem to be a few pivots:
      The data source/destination (buffer or texture) Read-write or read-only Structured or unstructured (?) Ordered vs unordered (?) These are just my observations based on a lot of MSDN and OpenGL doc reading. For my library, I'm not interested in exposing every possibility to the user -- just trying to find a good "middle-ground" that can be represented cleanly across API's which is good enough for common scenarios.
      Can anyone give a sort of "overview" of the different options, and perhaps compare/contrast the concepts between Direct3D, OpenGL, and Vulkan? I'd also be very interested in hearing how other folks have abstracted these concepts in their libraries.
    • By aejt
      I recently started getting into graphics programming (2nd try, first try was many years ago) and I'm working on a 3d rendering engine which I hope to be able to make a 3D game with sooner or later. I have plenty of C++ experience, but not a lot when it comes to graphics, and while it's definitely going much better this time, I'm having trouble figuring out how assets are usually handled by engines.
      I'm not having trouble with handling the GPU resources, but more so with how the resources should be defined and used in the system (materials, models, etc).
      This is my plan now, I've implemented most of it except for the XML parts and factories and those are the ones I'm not sure of at all:
      I have these classes:
      For GPU resources:
      Geometry: holds and manages everything needed to render a geometry: VAO, VBO, EBO. Texture: holds and manages a texture which is loaded into the GPU. Shader: holds and manages a shader which is loaded into the GPU. For assets relying on GPU resources:
      Material: holds a shader resource, multiple texture resources, as well as uniform settings. Mesh: holds a geometry and a material. Model: holds multiple meshes, possibly in a tree structure to more easily support skinning later on? For handling GPU resources:
      ResourceCache<T>: T can be any resource loaded into the GPU. It owns these resources and only hands out handles to them on request (currently string identifiers are used when requesting handles, but all resources are stored in a vector and each handle only contains resource's index in that vector) Resource<T>: The handles given out from ResourceCache. The handles are reference counted and to get the underlying resource you simply deference like with pointers (*handle).  
      And my plan is to define everything into these XML documents to abstract away files:
      Resources.xml for ref-counted GPU resources (geometry, shaders, textures) Resources are assigned names/ids and resource files, and possibly some attributes (what vertex attributes does this geometry have? what vertex attributes does this shader expect? what uniforms does this shader use? and so on) Are reference counted using ResourceCache<T> Assets.xml for assets using the GPU resources (materials, meshes, models) Assets are not reference counted, but they hold handles to ref-counted resources. References the resources defined in Resources.xml by names/ids. The XMLs are loaded into some structure in memory which is then used for loading the resources/assets using factory classes:
      Factory classes for resources:
      For example, a texture factory could contain the texture definitions from the XML containing data about textures in the game, as well as a cache containing all loaded textures. This means it has mappings from each name/id to a file and when asked to load a texture with a name/id, it can look up its path and use a "BinaryLoader" to either load the file and create the resource directly, or asynchronously load the file's data into a queue which then can be read from later to create the resources synchronously in the GL context. These factories only return handles.
      Factory classes for assets:
      Much like for resources, these classes contain the definitions for the assets they can load. For example, with the definition the MaterialFactory will know which shader, textures and possibly uniform a certain material has, and with the help of TextureFactory and ShaderFactory, it can retrieve handles to the resources it needs (Shader + Textures), setup itself from XML data (uniform values), and return a created instance of requested material. These factories return actual instances, not handles (but the instances contain handles).
       
       
      Is this a good or commonly used approach? Is this going to bite me in the ass later on? Are there other more preferable approaches? Is this outside of the scope of a 3d renderer and should be on the engine side? I'd love to receive and kind of advice or suggestions!
      Thanks!
    • By nedondev
      I 'm learning how to create game by using opengl with c/c++ coding, so here is my fist game. In video description also have game contain in Dropbox. May be I will make it better in future.
      Thanks.
  • Popular Now