Advertisement Jump to content
Sign in to follow this  

OpenGL Understanding the difference in how I should be drawing lots of objects

This topic is 1033 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Think of using this "design" more so from a software standpoint than a game and I hope it makes more sense.


I've been currently dabbling in OpenTK/OpenGL. I've created a class that represents a 2d rectangle, it is called a button. So, when a user instantiates a button, they can define any top/left/width/height locations and also a color. These can be changed at any time, so for my very first initial test, I just used the normal glBegin ... glEnd methods to render each button. For now it works, however I am not satisfied.


So I started reading/testing stuff about VBO's. From what I understand we don't want to make a VBO per object, so the idea of making 1 VBO per button is not ideal. This is the easiest start of VBO's for me though. I already did this as a quick test and it seemed to help performance a little bit, although I am still very underwhelmed. I understand that the constant switching of the binding buffer "per object" is the biggest downfall of this implementation.


Now that leaves me with an understanding that we should minimize the number of VBO's in total so the amount of switching the binding buffers is minimal. The understanding of this is what escapes me...


Buttons are just a start and they are predefined with 4 vertices (an array[3] of vertex - another quick class that holds x, y, z floats). The plan is too keep adding more classes that users can use (labels, lists, etc). These can then drawn in any fashion, not being forced into only 4 vertices "per control"...


I was thinking that anything with 4 vertices would go in 1 VBO, anything with 5 vertices in another, and so on... This way we only make new VBO's when they are needed, otherwise if something can fit into an already made VBO, we can add it there.


So, if someone renders a test of 100 buttons and 1 triangle, I will need 2 VBO's (100 buttons in 1 VBO and 1 triangle in the other VBO). Then the calls to draw 101 total objects is only 2 (1 per VBO).


Does this sound reasonable with how VBO's work? Or is there something else I should know or be doing?


P.S. - "Buttons" are my first class to me made, and there will exist both 2d and 3d stuff...

Share this post

Link to post
Share on other sites
It is perfectly valid to use a single VBO for all objects with similar geometry. You simply pass additional information to tell the vertex shader how to scale and translate the vertices.

But if you were making a UI that changes very infrequently, or at least has components that change infrequently, then you would just group all of the buttons together into a single VBO.

L. Spiro

Share this post

Link to post
Share on other sites


But if you were making a UI that changes very infrequently

I'm just not sure yet. The plan is to make it fully dynamic. That just gives me options from the start if I ever need them - changing the vertices, colors, etc.


It sounds like how I imagined would work then. I just need a way to keep track of which button (or class/control) corresponds to which set of vertices in the VBO.

Share this post

Link to post
Share on other sites
This is how I do it in my engine (first the basic idea):
1. One VBOs for all gui elements (text,buttons etc.)
2. Map the VBO, then write all widgets in the correct order to the VBO.
2.1 Remember the shader which will be used to render the widgets in a non-GPU memory (an ubershader + branching would work too).
3. Unmap the VBO
4. Bind the VBO.
5. for each widget activate shader & draw the widget (ranged draws).

To optimize it:
I. Use two VBOs, double buffering them which will display the GUI with one frame delay.
II. Merge the draw calls, if follow-up widgets use the same shader.

The latter works especially good for text rendering.

Share this post

Link to post
Share on other sites

I know that this isn't the way to be testing this, but I put in an integer counter and stopwatch. Every frame adds 1 to the counter and when the stopwatch elapses >= 1000 milliseconds, I reset the counter.


I spawned 1000 quads, each rendered with it's own Begin/End. The counter averaged between 25-30. I'm assuming this is equal to 1000 draw calls.


I then spawned 1000 quads into 1 VBO. The VBO now consists of an array of 4000 Vertex (a class that just holds X, Y, Z floats). The counter averaged between 75-80. I'm assuming this is equal to just 1 draw call.


I guess the increase in performance is nice to have. I'm just now concerned on how to go about modifying the individual pieces of the massive VBO array.


Previously, with the single Begin/End for each object, I could move (translate), rotate, color, etc each object individually as I see fit.


Now with using a VBO, I have mapped each object (which now contains a Guid on creation) to it's corresponding index entries in the massive array. So, at least now I can control the positioning of the objects (translating) individually. This just updates the entries for that object in the VBO array.


I've got some thinking and trial and error'ing to do about how to go about controlling individual rotations/scalings. For colors I have done some reading about creating another array to hold the color values for each piece. That seems about the same as the individual positions though.


Hopefully I am still on the right track...

Share this post

Link to post
Share on other sites
My previous implementation was based on pure immediate mode too (begin/end). I converted my gui rendering by doing this:
1. Choose a common format for all widgets (e.g. 4 vertices each having 1 tex coord,color etc.), one which can be mapped directly to the VBO.
2. Exchange the begin/end by writing the data to a cache, so instead of glVertex(xx) use something like data.position=xx
3. Caching was really useful for static text which have several hundred quads and changed really seldomly.
4. Map the "back" VBO (aka accessing the VBO memory directly)
5. Copy the displayed widgets from the cache to the "back" VBO each(!) frame (a simple memcpy if both use the same memory setup)
6. Unmap the "back" VBO (aka asynchroniously upload to the GPU memory).
7. Render the "front" VBO which has been updated and upload one frame before, so that the GPU renders one VBO while the other is uploaded (important to avoid stalling!).
8. Swap back/front VBO for the next frame.

This way I dropped my API calls according to gDebugger by 15k calls. Although, immediatemode is depreacted in OGL 3.0 and above.

Share this post

Link to post
Share on other sites

If you don't need to read the vbo on the cpu side and if you don't have to update many different parts of the vbo it would propably be faster to just push updates to the vbo using GL.BufferSubData. That way the driver doesn't have to retrieve the data back from the gpu. This is because mapping the buffer has the driver retrieve the specified part of vbo from vram into ram and then on unmap transfering the whole thing back to vram. With buffersubdata it never retrieves anything from gpu and only updates the part specified in the command.


But all of the methods suggested here are valid and the only way of knowing wich one has the most performance is to profile them with your actual use case. And when doing that you propably should look into highprecision timer and timing individual frames. Keep a list of say 600 frames and you can get the shortest the longest and the avarage. Measuring performance in how many milliseconds or if you want even in nanoseconds is important as that gives a much better idea of just how much your performance drops than frames per second. Also keeping a record of individual frames lets you see the longest frames and help you find stutters and such more easily.

All you have to worry about is if your frame time exceeds 16 milliseconds then you can no longer quarantee 60 fps and you might need to optimize. Or if you are targeting a differrent fps(for vr i think you need over 90 and for mobile 30 should be enough) you can calculate the millisecond limit by 1000/fps.

Share this post

Link to post
Share on other sites

Well, you can define the access mode when mapping the buffer, e.g. write only, so the API don't need to download the mapped buffer at all ! But a buffersubdata will be executed syncroniously and bears the danger of stalling your pipeline !

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement

Important Information

By using, you agree to our community Guidelines, Terms of Use, and Privacy Policy. is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!