Jump to content

  • Log In with Google      Sign In   
  • Create Account

OpenGL performance question


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
3 replies to this topic

#1 metsfan   Members   -  Reputation: 546

Like
0Likes
Like

Posted 16 June 2012 - 07:10 PM

Hey all,

I have a dilema right now where I can go in 2 directions, and neither one seems great, but one has to be chosen. I have a lot of objects that need to be drawn on the screen (these are UI elements, so basically rectangles with a background color or image, or a label with some text, ect). As I see it these are my 2 options:

1. Use one large VBO which has the information for all elements which need to be drawn, then one call to glDrawArrays to render them all
2. Create a VBO for each element, and call glDrawArrays individually for each element.

The upside to option 1 is that calls to glDrawArrays are minimized, and due to the fact that I'm using shaders to draw everything, you get the parallelization of shader rendering maximized. The downside is that if there is even a small change to the scene, you need to recreate the VBO and set the attribute data, which could end up getting somewhat large with a lot of elements on the screen.

The upside of option 2 is that I can set up a VBO for each element, and only recreate the VBO when the element changes, so the changes to the VBO's are more granular. However, there are many more calls to glDrawArrays, which hurts performance in the long run.

My main question is, what is worse: to be recreating one large VBO and setting the attrib data every time there is a change to the scene, or make many more draw calls, but update VBO's and attrib data less?

Thank you.

Sponsor:

#2 clb   Members   -  Reputation: 1604

Like
1Likes
Like

Posted 17 June 2012 - 03:26 AM

I use the second approach (although with glDrawElements). Performance is not a problem at the moment (I can do hundreds of UI windows), and if it gets too slow, I'll investigate whether batching manually might help.

In the first approach, it might not necessary to update the whole VB if one rectangle changes - you could update a sub-part of the vertex buffer, if you keep track of which UI element is at which index. Although, I've got to say in my codebase that might get a bit trickier than it sounds, since I'm double-buffering my dynamically updated VBs manually (which I have observed to give a performance benefit on GLES2 even when GL_STREAM_DRAW is being used), so the sub-updates should be made aware of double-buffering.
Me+PC=clb.demon.fi | C++ Math and Geometry library: MathGeoLib, test it live! | C++ Game Networking: kNet | 2D Bin Packing: RectangleBinPack | Use gcc/clang/emcc from VS: vs-tool | Resume+Portfolio | gfxapi, test it live!

#3 mhagain   Members   -  Reputation: 4032

Like
0Likes
Like

Posted 17 June 2012 - 06:37 AM

An alternative approach that I just became aware of recently is to use instancing. This is a hybrid combination of instancing and your option #1, and may seem a little unintuitive, so bear with me.

When you think about it, the data required to draw a GUI quad is fairly well-specified for everyone: 2 positions, 1 colour and 2 texcoords per-vertex. Assuming you're using a 4-byte colour that adds up to 80 bytes per-quad.

What you can do is to set up this data as per-instance data. So you've got 4 float position (x, x + w, y, y + h), 4 byte colour and 4 float texcoords (s-low, s-high, t-low, t-high) per-quad which gives you a total of 36-bytes, cutting the amount of data you need to stream to the GPU by over half.

You need a vertex shader to extract the quad points from that, so set up an array of 4 x vec4 containing this (this is set up for a triangle strip):
vec4 (1, 0, 1, 0),
	vec4 (0, 1, 1, 0),
	vec4 (1, 0, 0, 1),
	vec4 (0, 1, 0, 1)
Then each position.x is dot (incoming.xy, array[gl_VertexID].xy), position.y is dot (incoming.zw, array[gl_VertexID].zw), and likewise for texcoords.

The final draw call is glDrawArraysInstanced (GL_TRIANGLE_STRIP, 4, 0, numquads);

In this setup you'd have no per-vertex data so each attrib array has a divisor of 1. It's definitely a tradeoff so you need to be certain that the amount of data you send to the GPU is a bottleneck for you (which it may not actually be), but if it's the solution you need then it can work well enough.

Edited by mhagain, 17 June 2012 - 06:39 AM.

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#4 dpadam450   Members   -  Reputation: 557

Like
2Likes
Like

Posted 17 June 2012 - 01:30 PM

I usually do what is simple and optimize if needed later. I have never seen a UI with more than 50 different textures/elements which would be something like starcraft 2. For one, you only need a vbo of a square -.5 to .5 in size and just scale it with a new texture on it. To optimize that a bit, you can use a texture arrray so that you don't have to bind a texture for each image. But even then, just go with the simple solution first.

I think a lot of people think about optimizing the dumbest things. This is negligible at this point. GPU's /CPU's and motherboards are very fast. If you end up making a game that even uses so much power that it dips below 30 or 60 fps (whichever is your goal), then optimize. Until then, just get the game working, you may not need to even optimize once its all done.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS