OpenGL performance question
Members - Reputation: 546
Posted 16 June 2012 - 07:10 PM
I have a dilema right now where I can go in 2 directions, and neither one seems great, but one has to be chosen. I have a lot of objects that need to be drawn on the screen (these are UI elements, so basically rectangles with a background color or image, or a label with some text, ect). As I see it these are my 2 options:
1. Use one large VBO which has the information for all elements which need to be drawn, then one call to glDrawArrays to render them all
2. Create a VBO for each element, and call glDrawArrays individually for each element.
The upside to option 1 is that calls to glDrawArrays are minimized, and due to the fact that I'm using shaders to draw everything, you get the parallelization of shader rendering maximized. The downside is that if there is even a small change to the scene, you need to recreate the VBO and set the attribute data, which could end up getting somewhat large with a lot of elements on the screen.
The upside of option 2 is that I can set up a VBO for each element, and only recreate the VBO when the element changes, so the changes to the VBO's are more granular. However, there are many more calls to glDrawArrays, which hurts performance in the long run.
My main question is, what is worse: to be recreating one large VBO and setting the attrib data every time there is a change to the scene, or make many more draw calls, but update VBO's and attrib data less?
Members - Reputation: 1604
Posted 17 June 2012 - 03:26 AM
In the first approach, it might not necessary to update the whole VB if one rectangle changes - you could update a sub-part of the vertex buffer, if you keep track of which UI element is at which index. Although, I've got to say in my codebase that might get a bit trickier than it sounds, since I'm double-buffering my dynamically updated VBs manually (which I have observed to give a performance benefit on GLES2 even when GL_STREAM_DRAW is being used), so the sub-updates should be made aware of double-buffering.
Members - Reputation: 4032
Posted 17 June 2012 - 06:37 AM
When you think about it, the data required to draw a GUI quad is fairly well-specified for everyone: 2 positions, 1 colour and 2 texcoords per-vertex. Assuming you're using a 4-byte colour that adds up to 80 bytes per-quad.
What you can do is to set up this data as per-instance data. So you've got 4 float position (x, x + w, y, y + h), 4 byte colour and 4 float texcoords (s-low, s-high, t-low, t-high) per-quad which gives you a total of 36-bytes, cutting the amount of data you need to stream to the GPU by over half.
You need a vertex shader to extract the quad points from that, so set up an array of 4 x vec4 containing this (this is set up for a triangle strip):
vec4 (1, 0, 1, 0), vec4 (0, 1, 1, 0), vec4 (1, 0, 0, 1), vec4 (0, 1, 0, 1)Then each position.x is dot (incoming.xy, array[gl_VertexID].xy), position.y is dot (incoming.zw, array[gl_VertexID].zw), and likewise for texcoords.
The final draw call is glDrawArraysInstanced (GL_TRIANGLE_STRIP, 4, 0, numquads);
In this setup you'd have no per-vertex data so each attrib array has a divisor of 1. It's definitely a tradeoff so you need to be certain that the amount of data you send to the GPU is a bottleneck for you (which it may not actually be), but if it's the solution you need then it can work well enough.
Edited by mhagain, 17 June 2012 - 06:39 AM.
It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.
Members - Reputation: 557
Posted 17 June 2012 - 01:30 PM
I think a lot of people think about optimizing the dumbest things. This is negligible at this point. GPU's /CPU's and motherboards are very fast. If you end up making a game that even uses so much power that it dips below 30 or 60 fps (whichever is your goal), then optimize. Until then, just get the game working, you may not need to even optimize once its all done.