Sign in to follow this  
metsfan

OpenGL OpenGL performance question

Recommended Posts

Hey all,

I have a dilema right now where I can go in 2 directions, and neither one seems great, but one has to be chosen. I have a lot of objects that need to be drawn on the screen (these are UI elements, so basically rectangles with a background color or image, or a label with some text, ect). As I see it these are my 2 options:

1. Use one large VBO which has the information for all elements which need to be drawn, then one call to glDrawArrays to render them all
2. Create a VBO for each element, and call glDrawArrays individually for each element.

The upside to option 1 is that calls to glDrawArrays are minimized, and due to the fact that I'm using shaders to draw everything, you get the parallelization of shader rendering maximized. The downside is that if there is even a small change to the scene, you need to recreate the VBO and set the attribute data, which could end up getting somewhat large with a lot of elements on the screen.

The upside of option 2 is that I can set up a VBO for each element, and only recreate the VBO when the element changes, so the changes to the VBO's are more granular. However, there are many more calls to glDrawArrays, which hurts performance in the long run.

My main question is, what is worse: to be recreating one large VBO and setting the attrib data every time there is a change to the scene, or make many more draw calls, but update VBO's and attrib data less?

Thank you.

Share this post


Link to post
Share on other sites
I use the second approach (although with glDrawElements). Performance is not a problem at the moment (I can do hundreds of UI windows), and if it gets too slow, I'll investigate whether batching manually might help.

In the first approach, it might not necessary to update the whole VB if one rectangle changes - you could update a sub-part of the vertex buffer, if you keep track of which UI element is at which index. Although, I've got to say in my codebase that might get a bit trickier than it sounds, since I'm double-buffering my dynamically updated VBs manually (which I have observed to give a performance benefit on GLES2 even when GL_STREAM_DRAW is being used), so the sub-updates should be made aware of double-buffering.

Share this post


Link to post
Share on other sites
An alternative approach that I just became aware of recently is to use instancing. This is a hybrid combination of instancing and your option #1, and may seem a little unintuitive, so bear with me.

When you think about it, the data required to draw a GUI quad is fairly well-specified for everyone: 2 positions, 1 colour and 2 texcoords per-vertex. Assuming you're using a 4-byte colour that adds up to 80 bytes per-quad.

What you can do is to set up this data as per-instance data. So you've got 4 float position (x, x + w, y, y + h), 4 byte colour and 4 float texcoords (s-low, s-high, t-low, t-high) per-quad which gives you a total of 36-bytes, cutting the amount of data you need to stream to the GPU by over half.

You need a vertex shader to extract the quad points from that, so set up an array of 4 x vec4 containing this (this is set up for a triangle strip):[code] vec4 (1, 0, 1, 0),
vec4 (0, 1, 1, 0),
vec4 (1, 0, 0, 1),
vec4 (0, 1, 0, 1)[/code]
Then each position.x is dot (incoming.xy, array[gl_VertexID].xy), position.y is dot (incoming.zw, array[gl_VertexID].zw), and likewise for texcoords.

The final draw call is glDrawArraysInstanced (GL_TRIANGLE_STRIP, 4, 0, numquads);

In this setup you'd have no per-vertex data so each attrib array has a divisor of 1. It's definitely a tradeoff so you need to be certain that the amount of data you send to the GPU is a bottleneck for you (which it may not actually be), but if it's the solution you need then it can work well enough. Edited by mhagain

Share this post


Link to post
Share on other sites
I usually do what is simple and optimize if needed later. I have never seen a UI with more than 50 different textures/elements which would be something like starcraft 2. For one, you only need a vbo of a square -.5 to .5 in size and just scale it with a new texture on it. To optimize that a bit, you can use a texture arrray so that you don't have to bind a texture for each image. But even then, just go with the simple solution first.

I think a lot of people think about optimizing the dumbest things. This is negligible at this point. GPU's /CPU's and motherboards are very fast. If you end up making a game that even uses so much power that it dips below 30 or 60 fps (whichever is your goal), then optimize. Until then, just get the game working, you may not need to even optimize once its all done.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this