Using VBOs for dynamic geometry

Ari Kontinen · 2014-07-05T20:31:37

You don't actually need to invalidate the VBO when using GL.BufferData(). Inputting new data already does everything you need. If you want to use the invalidating you need to use GL.MapBuffer(). More on the subject here -> http://www.opentk.com/doc/graphics/geometry/vertex-buffer-objects. Also if you are rendering the buffer that you just updated then the potential speedup of using multiple buffers goes to waste seeing as the drawarrays has to wait for the datatransfer to complete before starting to do the actual rendering in wich case you might aswell use a single buffer. You need to update the data of the buffer that is going to be rendered next frame instead. Besides multibuffering VBO:s isn't usually going to give you much anyways as the bottleneck is most of the time somewhere else. Most times when gfx programmers talk about double or triple buffering what they mean is that they have two or three "screens" to wich they do all the rendering and in case of double buffering they swap the buffers after all rendering to the current frame has been completed. And in triple buffering they swap the two background rendering buffers after rendering is finished and swap the currently not in use rendering buffer with the displayd buffer when the monitor has finished presenting the buffer. Be careful of overoptimization. What you should do is set yourself a goal fps. And only start optimizing if you get below that fps. Anything above it shouldn't matter at all. If you want 60+ fps, you add a feature and your fps drops from 200 to 120 just shrug it off and continue adding the next feature. And always start with the easiest optimizations first as they are more likely to take less time to implement and over half the time it will get you above the target fps. Edit: oh and before you start to optimize anything profile the damn thing thoroughly so you avoid using tens of hours optimizing the part that takes 0.01% of the actual process. Use http://msdn.microsoft.com/en-us/library/system.diagnostics.stopwatch.aspx to measure the time it takes on "your" end of the process on different parts of the program. And GL.BeginQuery(QueryTarget.TimeElapsed,...); and GL.EndQuery(QueryTarget.TimeElapsed); to measure the time it takes for the driver and the gpu to perform the tasks that were issued between them.

Graphics and GPU Programming Programming

Started by NMO June 30, 2014 08:59 PM

10 comments, last by PunCrathod 9 years, 10 months ago

NMO

220

Author

June 30, 2014 08:59 PM

Hello,

I currently try to implement realtime CSG. This means that I have a 3D mesh and want to apply boolean operations on it. After each boolean operation, new vertices are added/removed to/from the mesh. This should happen in realtime, so my vertex data changes every frame. Currently, it is extremely slow, because I'm rendering my mesh in the following way (C# with OpenTK):


GL.Begin(PrimitiveType.Triangles);
foreach(Polygon p in PolygonVector) {
  foreach(Vertex v in p) {
   GL.Normal3(v.Normal.x, v.Normal.y, v.Normal.z);
   GL.Vertex3(v.x,v.y, v.z);
  }
}
GL.End();

So my idea was to use VBOs instead. But this is not easy, because my geometry data is changing every frame and VBOs have a fixed size. So how do I handle this?

Would it be the best way to create a new vertex buffer object every frame with my current mesh?

If someone could point me in the right direction I'd very much appreciate it.

Xycaleth

2,391

June 30, 2014 10:56 PM

How about a pool of preallocated vertex buffers of fixed sizes, for example, 16Kb, 64Kb, 256Kb etc.. And then each frame, you grab the smallest vertex buffer that can hold all your vertex data, and upload the data to that buffer and then draw with that buffer. To get around GPU stalling, you can have multiple of each size (e.g. 10 of each buffer size), and store them in a linked list and grab the buffer at the start of the linked list. At the end of the frame you can put it back at the end of the list. The list will naturally be ordered by least recently used and means you won't use the same buffer more than once between frames, and more or less guarantees that the frame using that buffer has been rendered.

If you find you have too much vertex data to fit in the largest buffer, then allocate N number of buffers large enough (perhaps twice as large as the previous largest?) so the cost of the allocation isn't too problematic.

TheChubu

9,484

July 01, 2014 12:44 AM

You could just treat it as if it were a dynamic array.

Start with a fixed size, and when you modify the vertex data, if its too much for the buffer to handle, orphan it and create a new one with a new size (you might want a "grow strategy", say, twice the previous size, or 1.5 times the previous size, whatever works best).

First few resizes won't be fast but after a while resizing won't happen that often.

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

L. Spiro

25,818

July 01, 2014 12:53 AM

and VBOs have a fixed size.

Every call to glBufferData() changes the size of the VBO, so you are working on false assumptions.

Double-buffer or triple-buffer the vertex buffers and call glBufferData() every frame to update them.

L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

samoth

9,833

July 01, 2014 09:16 AM

If vertex data changes every frame (i.e. you're streaming), you have four options:

1. one buffer, three times the size, persistently mapped (GL_MAP_WRITE_BIT|GL_MAP_PERSISTENT_BIT|GL_MAP_COHERENT_BIT)

2. one buffer, three times the size, mapped with GL_MAP_UNSYNCHRONIZED

3. one buffer, three times the size, using glBufferSubData

4. one buffer, invalidated using glBufferData(..., 0), and replaced by another one (again, glBufferData)

(1) is the fastest, but requires ARB_buffer_storage and setting fences, a third of the buffer you write to, the second third is in transfer, the last third is being drawn from

(2) avoids a CPU-GPU sync, but causes a client-server thread sync, requires GL 3.x and setting fences

(3) same as (2) but surprisingly faster on most IHVs, no fence needed

(4) compatibility mode, should probably even work with GL 1.5/2.0, somewhat slower but not that bad

NMO

220

Author

July 01, 2014 03:46 PM

Creating a pool and resizing is a good idea. However there is another problem I'm facing when I use VBOs:

I'm currently using a List to store vertex data and render them as described above. This is very comfortable, because I can add/remove vertices from the list every frame. The downside is that rendering a huge amount of vertices takes ages. So VBOs are much faster for rendering, but a lot of flexibility is taken away, because now I need to use arrays to store vertex/index data. With VBOs I have to build a new vertex/index/normal buffer whenever the geometry changes and in my case, this can happen every frame. I know that this is not a VBO/OpenGL problem per se, but maybe someone has good ideas to solve that.

samoth

9,833

July 01, 2014 04:14 PM

Well, it does not matter a lot. You can keep your list and still use VBOs if you use any form of mapping (it's just 3 more lines of code, and somewhat slower copying the elements into the buffer due to cache misses traversing the list). Only glBuffer(Sub)Data won't work with a list, obviously.

That said, very small objects such as vertices, a list is almost always a bad choice for a storage container (even though in theory, according to the textbook, it is the "correct one"). Despite anything that big-O might tell you, vectors (arrays) or deques (basically vectors of vectors) will perform equally or better for most operations most of the time. This, unintuitively, includes inserting at random locations, unless the objects are quite large (see for example this guy's benchmarks where list only breaks even for random inserts at an object size of 128 and loses everywhere else).

Why not try a vector (or a deque, if you will) and see how it works out? Notable C++ people nowadays suggest using vector as the default unless there is really a very urgent reason for something different.

And indeed, unless the dataset is huge, there's no issue with a vector for any operation (even random inserts), and it is very cache and copy friendly. A deque is somehow in the middle and combines the best (or worst, in some cases, e.g. frequent reallocations) of the two. Just give it a try, it doesn't really cost a lot of work.

PunCrathod

596

July 01, 2014 04:17 PM

Creating a pool and resizing is a good idea. However there is another problem I'm facing when I use VBOs:

I'm currently using a List to store vertex data and render them as described above. This is very comfortable, because I can add/remove vertices from the list every frame. The downside is that rendering a huge amount of vertices takes ages. So VBOs are much faster for rendering, but a lot of flexibility is taken away, because now I need to use arrays to store vertex/index data. With VBOs I have to build a new vertex/index/normal buffer whenever the geometry changes and in my case, this can happen every frame. I know that this is not a VBO/OpenGL problem per se, but maybe someone has good ideas to solve that.

Your code looks like c# so I'm assuming you haven't looked at the list.ToArray() function wich is almost free so you can GL.bufferdata(list.toarray,count). Also you don't need to generate new buffers. Just update the old ones.

NMO

220

Author

July 03, 2014 06:37 PM

I tried it now and it works. I use list.ToArray() to convert my list to an array.

To get around GPU stalling, you can have multiple of each size (e.g. 10 of each buffer size), and store them in a linked list and grab the buffer at the start of the linked list. At the end of the frame you can put it back at the end of the list.

But I don't understand what the advantage of multiple VBOs of same size is. Why does my GPU stall when I use just 1 vbo?

To understand you correctly: Lets say i allocate 10 vertex buffers and store them in a list. In the first frame, I take the first vbo in the list and store vertex data in it and send it to the graphics card. At the next frame I take the next empty VBO in the list and store my new vertex data in this one. Why shouldn't I just overwrite the first VBO with my new vertex data?

samoth

9,833

July 03, 2014 07:41 PM

But I don't understand what the advantage of multiple VBOs of same size is. Why does my GPU stall when I use just 1 vbo?

The whole point in using a buffer object in the first place is decoupling the rendering on the GPU from your drawing loop. If you use immediate mode (GL.Begin / End), the server conceptually must wait for you to submit one vertex after another, and it does not know when you'll be done before it sees GL.End(). At that point, it can upload the whole block of vertex data that it has collected and tell the GPU to do something with it. Which means that in the mean time, the GPU is doing nothing, which is not what you want. Ideally, you want the GPU and the CPU to work at the same time.

Similar thing when you draw with a vertex array (client side, not a buffer object). You save some API calls because instead of submitting every vertex one by one, you only submit one array and one draw command. Which is better already, but still the GPU has to wait. You could modify the data in that array at any time, so when is it safe for OpenGL to access this? The only time this is safe is within the draw call. As your thread is executing the draw call, the server knows that it can't execute something different, such as code that modifies the array. So, it has to wait until the draw call before it can make a copy and upload it.

A buffer object is owned by OpenGL. You cannot modify the contents except via the BufferData API or by mapping the buffer object. Which means that OpenGL knows that the buffer's contents are valid at all times. It can therefore upload the buffer without having to wait, and the GPU can start processing it as soon as it's done with whatever it was doing before.

In theory.

In practice, OpenGL must still make sure that "things work correctly", and it must fulfill the guarantees that the API provides. One such guarantee is that you are allowed to load data into a buffer and issue some drawing commands, then load different data into the buffer (while drawing isn't finished yet!) and issue some other drawing commands, and this must work "as expected". Which means no more and no less than if you use a single buffer, the server again has to synchronize.

Invalidating the buffer, or using several buffers or buffer sub-regions removes this need to synchronize. If, for example, you invalidate the buffer object with glBufferData(...,0) then you're telling OpenGL that you are done with this one, and it can do whatever it wants. OpenGL will keep the buffer contents around for as long as it still has unfinished drawing commands that read from it, and then it will throw it away. In the mean time, whenever you talk of that buffer, you are really talking of a new, different one. Which, of course, does not need to be synchronized, since no draw commands depend on it -- it's a totally different buffer.

Similar stuff with mapping persistent buffer subranges and such, except synchronizing properly (using fences) is your responsibility. In the average case, this does nothing because using 3 buffers is just good, and by the time you try to synchronize, it's all over already anyway. However, you must still do it to guarantee that everything still works correctly in the worst case.

Using VBOs for dynamic geometry

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Using VBOs for dynamic geometry

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines