Sign in to follow this  

OpenGL OpenGL performance question

Recommended Posts

metsfan    679
Hey all,

I have a dilema right now where I can go in 2 directions, and neither one seems great, but one has to be chosen. I have a lot of objects that need to be drawn on the screen (these are UI elements, so basically rectangles with a background color or image, or a label with some text, ect). As I see it these are my 2 options:

1. Use one large VBO which has the information for all elements which need to be drawn, then one call to glDrawArrays to render them all
2. Create a VBO for each element, and call glDrawArrays individually for each element.

The upside to option 1 is that calls to glDrawArrays are minimized, and due to the fact that I'm using shaders to draw everything, you get the parallelization of shader rendering maximized. The downside is that if there is even a small change to the scene, you need to recreate the VBO and set the attribute data, which could end up getting somewhat large with a lot of elements on the screen.

The upside of option 2 is that I can set up a VBO for each element, and only recreate the VBO when the element changes, so the changes to the VBO's are more granular. However, there are many more calls to glDrawArrays, which hurts performance in the long run.

My main question is, what is worse: to be recreating one large VBO and setting the attrib data every time there is a change to the scene, or make many more draw calls, but update VBO's and attrib data less?

Thank you.

Share this post

Link to post
Share on other sites
clb    2147
I use the second approach (although with glDrawElements). Performance is not a problem at the moment (I can do hundreds of UI windows), and if it gets too slow, I'll investigate whether batching manually might help.

In the first approach, it might not necessary to update the whole VB if one rectangle changes - you could update a sub-part of the vertex buffer, if you keep track of which UI element is at which index. Although, I've got to say in my codebase that might get a bit trickier than it sounds, since I'm double-buffering my dynamically updated VBs manually (which I have observed to give a performance benefit on GLES2 even when GL_STREAM_DRAW is being used), so the sub-updates should be made aware of double-buffering.

Share this post

Link to post
Share on other sites
mhagain    13430
An alternative approach that I just became aware of recently is to use instancing. This is a hybrid combination of instancing and your option #1, and may seem a little unintuitive, so bear with me.

When you think about it, the data required to draw a GUI quad is fairly well-specified for everyone: 2 positions, 1 colour and 2 texcoords per-vertex. Assuming you're using a 4-byte colour that adds up to 80 bytes per-quad.

What you can do is to set up this data as per-instance data. So you've got 4 float position (x, x + w, y, y + h), 4 byte colour and 4 float texcoords (s-low, s-high, t-low, t-high) per-quad which gives you a total of 36-bytes, cutting the amount of data you need to stream to the GPU by over half.

You need a vertex shader to extract the quad points from that, so set up an array of 4 x vec4 containing this (this is set up for a triangle strip):[code] vec4 (1, 0, 1, 0),
vec4 (0, 1, 1, 0),
vec4 (1, 0, 0, 1),
vec4 (0, 1, 0, 1)[/code]
Then each position.x is dot (incoming.xy, array[gl_VertexID].xy), position.y is dot (, array[gl_VertexID].zw), and likewise for texcoords.

The final draw call is glDrawArraysInstanced (GL_TRIANGLE_STRIP, 4, 0, numquads);

In this setup you'd have no per-vertex data so each attrib array has a divisor of 1. It's definitely a tradeoff so you need to be certain that the amount of data you send to the GPU is a bottleneck for you (which it may not actually be), but if it's the solution you need then it can work well enough. Edited by mhagain

Share this post

Link to post
Share on other sites
dpadam450    2357
I usually do what is simple and optimize if needed later. I have never seen a UI with more than 50 different textures/elements which would be something like starcraft 2. For one, you only need a vbo of a square -.5 to .5 in size and just scale it with a new texture on it. To optimize that a bit, you can use a texture arrray so that you don't have to bind a texture for each image. But even then, just go with the simple solution first.

I think a lot of people think about optimizing the dumbest things. This is negligible at this point. GPU's /CPU's and motherboards are very fast. If you end up making a game that even uses so much power that it dips below 30 or 60 fps (whichever is your goal), then optimize. Until then, just get the game working, you may not need to even optimize once its all done.

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Similar Content

    • By Zaphyk
      I am developing my engine using the OpenGL 3.3 compatibility profile. It runs as expected on my NVIDIA card and on my Intel Card however when I tried it on an AMD setup it ran 3 times worse than on the other setups. Could this be a AMD driver thing or is this probably a problem with my OGL code? Could a different code standard create such bad performance?
    • By Kjell Andersson
      I'm trying to get some legacy OpenGL code to run with a shader pipeline,
      The legacy code uses glVertexPointer(), glColorPointer(), glNormalPointer() and glTexCoordPointer() to supply the vertex information.
      I know that it should be using setVertexAttribPointer() etc to clearly define the layout but that is not an option right now since the legacy code can't be modified to that extent.
      I've got a version 330 vertex shader to somewhat work:
      #version 330 uniform mat4 osg_ModelViewProjectionMatrix; uniform mat4 osg_ModelViewMatrix; layout(location = 0) in vec4 Vertex; layout(location = 2) in vec4 Normal; // Velocity layout(location = 3) in vec3 TexCoord; // TODO: is this the right layout location? out VertexData { vec4 color; vec3 velocity; float size; } VertexOut; void main(void) { vec4 p0 = Vertex; vec4 p1 = Vertex + vec4(Normal.x, Normal.y, Normal.z, 0.0f); vec3 velocity = (osg_ModelViewProjectionMatrix * p1 - osg_ModelViewProjectionMatrix * p0).xyz; VertexOut.velocity = velocity; VertexOut.size = TexCoord.y; gl_Position = osg_ModelViewMatrix * Vertex; } What works is the Vertex and Normal information that the legacy C++ OpenGL code seem to provide in layout location 0 and 2. This is fine.
      What I'm not getting to work is the TexCoord information that is supplied by a glTexCoordPointer() call in C++.
      What layout location is the old standard pipeline using for glTexCoordPointer()? Or is this undefined?
      Side note: I'm trying to get an OpenSceneGraph 3.4.0 particle system to use custom vertex, geometry and fragment shaders for rendering the particles.
    • By markshaw001
      Hi i am new to this forum  i wanted to ask for help from all of you i want to generate real time terrain using a 32 bit heightmap i am good at c++ and have started learning Opengl as i am very interested in making landscapes in opengl i have looked around the internet for help about this topic but i am not getting the hang of the concepts and what they are doing can some here suggests me some good resources for making terrain engine please for example like tutorials,books etc so that i can understand the whole concept of terrain generation.
    • By KarimIO
      Hey guys. I'm trying to get my application to work on my Nvidia GTX 970 desktop. It currently works on my Intel HD 3000 laptop, but on the desktop, every bind textures specifically from framebuffers, I get half a second of lag. This is done 4 times as I have three RGBA textures and one depth 32F buffer. I tried to use debugging software for the first time - RenderDoc only shows SwapBuffers() and no OGL calls, while Nvidia Nsight crashes upon execution, so neither are helpful. Without binding it runs regularly. This does not happen with non-framebuffer binds.
      GLFramebuffer::GLFramebuffer(FramebufferCreateInfo createInfo) { glGenFramebuffers(1, &fbo); glBindFramebuffer(GL_FRAMEBUFFER, fbo); textures = new GLuint[createInfo.numColorTargets]; glGenTextures(createInfo.numColorTargets, textures); GLenum *DrawBuffers = new GLenum[createInfo.numColorTargets]; for (uint32_t i = 0; i < createInfo.numColorTargets; i++) { glBindTexture(GL_TEXTURE_2D, textures[i]); GLint internalFormat; GLenum format; TranslateFormats(createInfo.colorFormats[i], format, internalFormat); // returns GL_RGBA and GL_RGBA glTexImage2D(GL_TEXTURE_2D, 0, internalFormat, createInfo.width, createInfo.height, 0, format, GL_FLOAT, 0); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST); DrawBuffers[i] = GL_COLOR_ATTACHMENT0 + i; glBindTexture(GL_TEXTURE_2D, 0); glFramebufferTexture(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0 + i, textures[i], 0); } if (createInfo.depthFormat != FORMAT_DEPTH_NONE) { GLenum depthFormat; switch (createInfo.depthFormat) { case FORMAT_DEPTH_16: depthFormat = GL_DEPTH_COMPONENT16; break; case FORMAT_DEPTH_24: depthFormat = GL_DEPTH_COMPONENT24; break; case FORMAT_DEPTH_32: depthFormat = GL_DEPTH_COMPONENT32; break; case FORMAT_DEPTH_24_STENCIL_8: depthFormat = GL_DEPTH24_STENCIL8; break; case FORMAT_DEPTH_32_STENCIL_8: depthFormat = GL_DEPTH32F_STENCIL8; break; } glGenTextures(1, &depthrenderbuffer); glBindTexture(GL_TEXTURE_2D, depthrenderbuffer); glTexImage2D(GL_TEXTURE_2D, 0, depthFormat, createInfo.width, createInfo.height, 0, GL_DEPTH_COMPONENT, GL_FLOAT, 0); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST); glBindTexture(GL_TEXTURE_2D, 0); glFramebufferTexture(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, depthrenderbuffer, 0); } if (createInfo.numColorTargets > 0) glDrawBuffers(createInfo.numColorTargets, DrawBuffers); else glDrawBuffer(GL_NONE); if (glCheckFramebufferStatus(GL_FRAMEBUFFER) != GL_FRAMEBUFFER_COMPLETE) std::cout << "Framebuffer Incomplete\n"; glBindFramebuffer(GL_FRAMEBUFFER, 0); width = createInfo.width; height = createInfo.height; } // ... // FBO Creation FramebufferCreateInfo gbufferCI; gbufferCI.colorFormats =; gbufferCI.depthFormat = FORMAT_DEPTH_32; gbufferCI.numColorTargets = gbufferCFs.size(); gbufferCI.width = engine.settings.resolutionX; gbufferCI.height = engine.settings.resolutionY; gbufferCI.renderPass = nullptr; gbuffer = graphicsWrapper->CreateFramebuffer(gbufferCI); // Bind glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fbo); // Draw here... // Bind to textures glActiveTexture(GL_TEXTURE0); glBindTexture(GL_TEXTURE_2D, textures[0]); glActiveTexture(GL_TEXTURE1); glBindTexture(GL_TEXTURE_2D, textures[1]); glActiveTexture(GL_TEXTURE2); glBindTexture(GL_TEXTURE_2D, textures[2]); glActiveTexture(GL_TEXTURE3); glBindTexture(GL_TEXTURE_2D, depthrenderbuffer); Here is an extract of my code. I can't think of anything else to include. I've really been butting my head into a wall trying to think of a reason but I can think of none and all my research yields nothing. Thanks in advance!
    • By Adrianensis
      Hi everyone, I've shared my 2D Game Engine source code. It's the result of 4 years working on it (and I still continue improving features ) and I want to share with the community. You can see some videos on youtube and some demo gifs on my twitter account.
      This Engine has been developed as End-of-Degree Project and it is coded in Javascript, WebGL and GLSL. The engine is written from scratch.
      This is not a professional engine but it's for learning purposes, so anyone can review the code an learn basis about graphics, physics or game engine architecture. Source code on this GitHub repository.
      I'm available for a good conversation about Game Engine / Graphics Programming
  • Popular Now