Using VBOs for dynamic geometry

This topic is 1297 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

Hello,

I currently try to implement realtime CSG. This means that I have a 3D mesh and want to apply boolean operations on it. After each boolean operation, new vertices are added/removed to/from the mesh. This should happen in realtime, so my vertex data changes every frame. Currently, it is extremely slow, because I'm rendering my mesh in the following way (C# with OpenTK):

GL.Begin(PrimitiveType.Triangles);
foreach(Polygon p in PolygonVector) {
foreach(Vertex v in p) {
GL.Normal3(v.Normal.x, v.Normal.y, v.Normal.z);
GL.Vertex3(v.x,v.y, v.z);
}
}
GL.End();


So my idea was to use VBOs instead. But this is not easy, because my geometry data is changing every frame and VBOs have a fixed size. So how do I handle this?

Would it be the best way to create  a new vertex buffer object every frame with my current mesh?

If someone could point me in the right direction I'd very much appreciate it.

Share on other sites

How about a pool of preallocated vertex buffers of fixed sizes, for example, 16Kb, 64Kb, 256Kb etc.. And then each frame, you grab the smallest vertex buffer that can hold all your vertex data, and upload the data to that buffer and then draw with that buffer. To get around GPU stalling, you can have multiple of each size (e.g. 10 of each buffer size), and store them in a linked list and grab the buffer at the start of the linked list. At the end of the frame you can put it back at the end of the list. The list will naturally be ordered by least recently used and means you won't use the same buffer more than once between frames, and more or less guarantees that the frame using that buffer has been rendered.

If you find you have too much vertex data to fit in the largest buffer, then allocate N number of buffers large enough (perhaps twice as large as the previous largest?) so the cost of the allocation isn't too problematic.

Share on other sites

You could just treat it as if it were a dynamic array.

Start with a fixed size, and when you modify the vertex data, if its too much for the buffer to handle, orphan it and create a new one with a new size (you might want a "grow strategy", say, twice the previous size, or 1.5 times the previous size, whatever works best).

First few resizes won't be fast but after a while resizing won't happen that often.

Share on other sites

and VBOs have a fixed size.

Every call to glBufferData() changes the size of the VBO, so you are working on false assumptions.

Double-buffer or triple-buffer the vertex buffers and call glBufferData() every frame to update them.

L. Spiro

Share on other sites

If vertex data changes every frame (i.e. you're streaming), you have four options:

1. one buffer, three times the size, persistently mapped (GL_MAP_WRITE_BIT|GL_MAP_PERSISTENT_BIT|GL_MAP_COHERENT_BIT)

2. one buffer, three times the size, mapped with GL_MAP_UNSYNCHRONIZED

3. one buffer, three times the size, using glBufferSubData

4. one buffer, invalidated using glBufferData(..., 0), and replaced by another one (again, glBufferData)

(1) is the fastest, but requires ARB_buffer_storage and setting fences, a third of the buffer you write to, the second third is in transfer, the last third is being drawn from

(2) avoids a CPU-GPU sync, but causes a client-server thread sync, requires GL 3.x and setting fences

(3) same as (2) but surprisingly faster on most IHVs, no fence needed

(4) compatibility mode, should probably even work with GL 1.5/2.0, somewhat slower but not that bad

Edited by samoth

Share on other sites

Creating a pool and resizing is a good idea. However there is another problem I'm facing when I use VBOs:

I'm currently using a List to store vertex data and render them as described above. This is very comfortable, because I can add/remove vertices from the list every frame. The downside is that rendering a huge amount of vertices takes ages. So VBOs are much faster for rendering, but a lot of flexibility is taken away, because now I need to use arrays to store vertex/index data. With VBOs I have to build a new vertex/index/normal buffer whenever the geometry changes and in my case, this can happen every frame. I know that this is not a VBO/OpenGL problem per se, but maybe someone has good ideas to solve that.

Share on other sites

Well, it does not matter a lot. You can keep your list and still use VBOs if you use any form of mapping (it's just 3 more lines of code, and somewhat slower copying the elements into the buffer due to cache misses traversing the list). Only glBuffer(Sub)Data won't work with a list, obviously.

That said, very small objects such as vertices, a list is almost always a bad choice for a storage container (even though in theory, according to the textbook, it is the "correct one"). Despite anything that big-O might tell you, vectors (arrays) or deques (basically vectors of vectors) will perform equally or better for most operations most of the time. This, unintuitively, includes inserting at random locations, unless the objects are quite large (see for example this guy's benchmarks where list only breaks even for random inserts at an object size of 128 and loses everywhere else).

Why not try a vector (or a deque, if you will) and see how it works out? Notable C++ people nowadays suggest using vector as the default unless there is really a very urgent reason for something different.

And indeed, unless the dataset is huge, there's no issue with a vector for any operation (even random inserts), and it is very cache and copy friendly. A deque is somehow in the middle and combines the best (or worst, in some cases, e.g. frequent reallocations) of the two. Just give it a try, it doesn't really cost a lot of work.

Share on other sites

Creating a pool and resizing is a good idea. However there is another problem I'm facing when I use VBOs:

I'm currently using a List to store vertex data and render them as described above. This is very comfortable, because I can add/remove vertices from the list every frame. The downside is that rendering a huge amount of vertices takes ages. So VBOs are much faster for rendering, but a lot of flexibility is taken away, because now I need to use arrays to store vertex/index data. With VBOs I have to build a new vertex/index/normal buffer whenever the geometry changes and in my case, this can happen every frame. I know that this is not a VBO/OpenGL problem per se, but maybe someone has good ideas to solve that.

Your code looks like c# so I'm assuming you haven't looked at the list.ToArray() function wich is almost free so you can GL.bufferdata(list.toarray,count). Also you don't need to generate new buffers. Just update the old ones.

Share on other sites

I tried it now and it works.  I use list.ToArray() to convert my list to an array.

To get around GPU stalling, you can have multiple of each size (e.g. 10 of each buffer size), and store them in a linked list and grab the buffer at the start of the linked list. At the end of the frame you can put it back at the end of the list.

But I don't understand what the advantage of multiple VBOs of same size is. Why does my GPU stall when I use just 1 vbo?

To understand you correctly: Lets say i allocate 10 vertex buffers and store them in a list. In the first frame, I take the first vbo in the list and store vertex data in it and send it to the graphics card. At the next frame I take the next empty VBO in the list and store my new vertex data in this one. Why shouldn't I just overwrite the first VBO with my new vertex data?

Share on other sites
But I don't understand what the advantage of multiple VBOs of same size is. Why does my GPU stall when I use just 1 vbo?

The whole point in using a buffer object in the first place is decoupling the rendering on the GPU from your drawing loop. If you use immediate mode (GL.Begin / End), the server conceptually must wait for you to submit one vertex after another, and it does not know when you'll be done before it sees GL.End(). At that point, it can upload the whole block of vertex data that it has collected and tell the GPU to do something with it. Which means that in the mean time, the GPU is doing nothing, which is not what you want. Ideally, you want the GPU and the CPU to work at the same time.

Similar thing when you draw with a vertex array (client side, not a buffer object). You save some API calls because instead of submitting every vertex one by one, you only submit one array and one draw command. Which is better already, but still the GPU has to wait. You could modify the data in that array at any time, so when is it safe for OpenGL to access this? The only time this is safe is within the draw call. As your thread is executing the draw call, the server knows that it can't execute something different, such as code that modifies the array. So, it has to wait until the draw call before it can make a copy and upload it.

A buffer object is owned by OpenGL. You cannot modify the contents except via the BufferData API or by mapping the buffer object. Which means that OpenGL knows that the buffer's contents are valid at all times. It can therefore upload the buffer without having to wait, and the GPU can start processing it as soon as it's done with whatever it was doing before.

In theory.

In practice, OpenGL must still make sure that "things work correctly", and it must fulfill the guarantees that the API provides. One such guarantee is that you are allowed to load data into a buffer and issue some drawing commands, then load different data into the buffer (while drawing isn't finished yet!) and issue some other drawing commands, and this must work "as expected". Which means no more and no less than if you use a single buffer, the server again has to synchronize.

Invalidating the buffer, or using several buffers or buffer sub-regions removes this need to synchronize. If, for example, you invalidate the buffer object with glBufferData(...,0) then you're telling OpenGL that you are done with this one, and it can do whatever it wants. OpenGL will keep the buffer contents around for as long as it still has unfinished drawing commands that read from it, and then it will throw it away. In the mean time, whenever you talk of that buffer, you are really talking of a new, different one. Which, of course, does not need to be synchronized, since no draw commands depend on it -- it's a totally different buffer.

Similar stuff with mapping persistent buffer subranges and such, except synchronizing properly (using fences) is your responsibility. In the average case, this does nothing because using 3 buffers is just good, and by the time you try to synchronize, it's all over already anyway. However, you must still do it to guarantee that everything still works correctly in the worst case.

Edited by samoth

Share on other sites

If the driver can see that it will stall the GPU when you update a buffer is already in use then, then it might allocate a new block of memory and use that as the new memory region for the buffer, and discard the old block of memory when the GPU's finished with it. Doing this, the driver can avoid the stall as no two operations are using the same memory, but there's no guarantee that the driver will do this, nor is the driver required to do this.

Share on other sites

Thank you guys. You have helped me a lot so far.

VBOs are harder to understand than I had expected.

Nevertheless I tried to implement a triple buffered VBO. I used the glBufferData(..., 0) trick, which samoth described above. Its maybe not the fastest, but it should suffice for now.

Can you have a look at the code, whether its ok? Its C#, but it should be easy to understand for non C# programmers. Maybe its helpful for others who have similar problems.

First, initialization:

        uint[] VBO_IDs = new uint[3];
int counter = 0;
long CAPACITY = 0x800000; // 8 MB

{
GL.GenBuffers(3, VBO_IDs);
ErrorCode code = GL.GetError();
if (code != 0)
throw new Exception(code.ToString());

// create buffers
GL.BindBuffer(BufferTarget.ArrayBuffer, VBO_IDs[0]);
GL.BufferData(BufferTarget.ArrayBuffer, (IntPtr)(CAPACITY), IntPtr.Zero, BufferUsageHint.StreamDraw);
GL.BindBuffer(BufferTarget.ArrayBuffer, 0);

GL.BindBuffer(BufferTarget.ArrayBuffer, VBO_IDs[1]);
GL.BufferData(BufferTarget.ArrayBuffer, (IntPtr)(CAPACITY), IntPtr.Zero, BufferUsageHint.StreamDraw);
GL.BindBuffer(BufferTarget.ArrayBuffer, 0);

GL.BindBuffer(BufferTarget.ArrayBuffer, VBO_IDs[2]);
GL.BufferData(BufferTarget.ArrayBuffer, (IntPtr)(CAPACITY), IntPtr.Zero, BufferUsageHint.StreamDraw);
GL.BindBuffer(BufferTarget.ArrayBuffer, 0);
}


And rendering:

 Vertex3f[] mesharr = Mesh.ToArray;
int bufferSize;
// Vertex Array Buffer
{
// Bind current context to Array Buffer ID
GL.BindBuffer(BufferTarget.ArrayBuffer, VBO_IDs[counter]);

// Send data to buffer
GL.BufferData(BufferTarget.ArrayBuffer, (IntPtr)(mesharr.Length * Vertex3f.Stride), mesharr, BufferUsageHint.StreamDraw);

// Validate that the buffer is the correct size
GL.GetBufferParameter(BufferTarget.ArrayBuffer, BufferParameterName.BufferSize, out bufferSize);
if (mesharr.Length * Vertex3f.Stride != bufferSize)
throw new ApplicationException("Vertex array not uploaded correctly");

// Set the Pointer to the current bound array describing how the data ia stored
GL.VertexPointer(3, VertexPointerType.Float, Vertex3f.Stride, IntPtr.Zero);

// Enable the client state so it will use this array buffer pointer
GL.EnableClientState(ArrayCap.VertexArray);
// Enable the client state so it will use this array buffer pointer
GL.EnableClientState(ArrayCap.NormalArray);
GL.NormalPointer(NormalPointerType.Float, Vertex3f.Stride, (IntPtr)(3 * sizeof(float)));

GL.DrawArrays(PrimitiveType.Triangles, 0, mesharr.Length);

//invalidate VBO
GL.BufferData(BufferTarget.ArrayBuffer, (IntPtr)(CAPACITY), IntPtr.Zero, BufferUsageHint.StreamDraw);
GL.BindBuffer(BufferTarget.ArrayBuffer, 0);
}

GL.PopClientAttrib();

counter = counter == 2 ? 0 : counter+1;


Share on other sites

I benchmarked it: Unfortunately, there is no real difference between regarding performance when using 1 vs 3 VBOs.

When I render 34960 triangles with 3 VBOs (see code above), I get ~200 FPS.

When I render 34960 triangles with just 1 VBO, I get ~210 FPS.

Share on other sites

You shouldn't need to explicitly invalidate the VBO you've just used after drawing.

There also won't necessarily be any performance gain by using double/triple/N buffering. If the GPU has finished all draw calls using the vertex buffer in the previous frame, before you update the same buffer in the current frame then you won't see any difference as the driver doesn't need to synchronize anything. Think of all your GL calls as submissions into a queue. The GPU may consume the items of the queue at a different rate than you submit them. So if the GPU consumes them faster than your CPU submissions, then no synchronization will ever be needed (as it seems to be in your case). It's only when the GPU consumes the items slower than your CPU submissions that it needs to synchronize.

Edited by Xycaleth

Share on other sites
You shouldn't need to explicitly invalidate the VBO you've just used after drawing.

You might not need (on some drivers), but it is the "correct" thing to do. Or, well, one of several correct things to do (this one is the "traditional" recipe as opposed to the more modern unsynchonized or persistent maping stuff). See Server-side multi buffering.

Explicitly using several buffers is what they call "Client-side muli buffering" in that article.

Unfortunately, there is no real difference between regarding performance when using 1 vs 3 VBOs.

You are using multi-buffering in both cases, only once it's explicit, client-side, and once it's happening invisibly in the server. Since you are using multi-buffering in either case, it's not surprising that there is no big difference.

Edited by samoth

Share on other sites

You don't actually need to invalidate the VBO when using GL.BufferData(). Inputting new data already does everything you need. If you want to use the invalidating you need to use GL.MapBuffer(). More on the subject here -> http://www.opentk.com/doc/graphics/geometry/vertex-buffer-objects.

Also if you are rendering the buffer that you just updated then the potential speedup of using multiple buffers goes to waste seeing as the drawarrays has to wait for the datatransfer to complete before starting to do the actual rendering in wich case you might aswell use a single buffer. You need to update the data of the buffer that is going to be rendered next frame instead. Besides multibuffering VBO:s isn't usually going to give you much anyways as the bottleneck is most of the time somewhere else.

Most times when gfx programmers talk about double or triple buffering what they mean is that they have two or three "screens" to wich they do all the rendering and in case of double buffering they swap the buffers after all rendering to the current frame has been completed. And in triple buffering they swap the two background rendering buffers after rendering is finished and swap the currently not in use rendering buffer with the displayd buffer when the monitor has finished presenting the buffer.

Be careful of overoptimization. What you should do is set yourself a goal fps. And only start optimizing if you get below that fps. Anything above it shouldn't matter at all. If you want 60+ fps, you add a feature and your fps drops from 200 to 120 just shrug it off and continue adding the next feature. And always start with the easiest optimizations first as they are more likely to take less time to implement and over half the time it will get you above the target fps.

Edit: oh and before you start to optimize anything profile the damn thing thoroughly so you avoid using tens of hours optimizing the part that takes 0.01% of the actual process. Use http://msdn.microsoft.com/en-us/library/system.diagnostics.stopwatch.aspx to measure the time it takes on "your" end of the process on different parts of the program. And GL.BeginQuery(QueryTarget.TimeElapsed,...); and GL.EndQuery(QueryTarget.TimeElapsed); to measure the time it takes for the driver and the gpu to perform the tasks that were issued between them.

Edited by PunCrathod