But I don't understand what the advantage of multiple VBOs of same size is. Why does my GPU stall when I use just 1 vbo?
The whole point in using a buffer object in the first place is decoupling the rendering on the GPU from your drawing loop. If you use immediate mode (GL.Begin / End), the server conceptually must wait for you to submit one vertex after another, and it does not know when you'll be done before it sees GL.End(). At that point, it can upload the whole block of vertex data that it has collected and tell the GPU to do something with it. Which means that in the mean time, the GPU is doing nothing, which is not what you want. Ideally, you want the GPU and the CPU to work at the same time.
Similar thing when you draw with a vertex array (client side, not a buffer object). You save some API calls because instead of submitting every vertex one by one, you only submit one array and one draw command. Which is better already, but still the GPU has to wait. You could modify the data in that array at any time, so when is it safe for OpenGL to access this? The only time this is safe is within the draw call. As your thread is executing the draw call, the server knows that it can't execute something different, such as code that modifies the array. So, it has to wait until the draw call before it can make a copy and upload it.
A buffer object is owned by OpenGL. You cannot modify the contents except via the BufferData API or by mapping the buffer object. Which means that OpenGL knows that the buffer's contents are valid at all times. It can therefore upload the buffer without having to wait, and the GPU can start processing it as soon as it's done with whatever it was doing before.
In practice, OpenGL must still make sure that "things work correctly", and it must fulfill the guarantees that the API provides. One such guarantee is that you are allowed to load data into a buffer and issue some drawing commands, then load different data into the buffer (while drawing isn't finished yet!) and issue some other drawing commands, and this must work "as expected". Which means no more and no less than if you use a single buffer, the server again has to synchronize.
Invalidating the buffer, or using several buffers or buffer sub-regions removes this need to synchronize. If, for example, you invalidate the buffer object with glBufferData(...,0) then you're telling OpenGL that you are done with this one, and it can do whatever it wants. OpenGL will keep the buffer contents around for as long as it still has unfinished drawing commands that read from it, and then it will throw it away. In the mean time, whenever you talk of that buffer, you are really talking of a new, different one. Which, of course, does not need to be synchronized, since no draw commands depend on it -- it's a totally different buffer.
Similar stuff with mapping persistent buffer subranges and such, except synchronizing properly (using fences) is your responsibility. In the average case, this does nothing because using 3 buffers is just good, and by the time you try to synchronize, it's all over already anyway. However, you must still do it to guarantee that everything still works correctly in the worst case.
Edited by samoth, 03 July 2014 - 01:43 PM.