# VBOs with DYNAMIC_DRAW slower than arrays?

This topic is 4016 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hello, I just implemented VBOs in my code and I have questions about the performance increase I should expect to get from this. I am rendering a scene with various static objects and 2 characters that are skinned. When I use vertex arrays, I get render times around of about 30ms to 35ms. So I implemented VBOs in the engine to increase the performance of the rendering. I used GL_STATIC_DRAW for everything except the characters, that are using GL_DYNAMIC_DRAW. The performance is satisfying if the characters are not moving as I get render times ranging from 16 to 20ms. Their mesh is not modified by the skinning and thus not re-uploaded to the GPU. But as soon as I make the characters to move, their mesh needs to be re-uploaded to the GPU once for each frame, and I then see a decrease of performance, even worst than when I was using vertex arrays, as I get render times ranging from 40ms to 45ms. The method I am using right now with VBOs is that I keep a copy of all vertex data on the client side, and allocate a VBO and call glBufferData() for each of them. If the vertex gets dirty by skinning, it gets re-uploaded to the GPU using glBufferData(). Thus I am not using glMapBuffer() in my code. Any idea or suggestion about if I am doing something wrong, or if I should proceed some other way to make the performance even better? Thanks for any input, Happy Go Lucky

##### Share on other sites
I have a question for instance on the way I am sequencing my GL calls.

Right now it's only when a specific object gets rendered that it gets refreshed to the GPU if needed. This has for consequence that glBufferData() gets called almost right before glDrawElements(). Would it be faster to actually upload all the dirty buffers to the GPU before starting rendering? Maybe it would somewhat reduce some bottlenecks in the driver?

Basically what I am doing right now is:

RenderScene(){    for (each object to render)    {        if (mesh is dirty)            glBufferData()        glVertexPointer();        glNormalPointer();        etc.        glDrawElements();    }}

Would it be better to do this?

RenderScene(){    for (each object to render)    {        if (mesh is dirty)            glBufferData();    }    for (each object to render)    {        glVertexPointer();        glNormalPointer();        etc.        glDrawElements();    }}

Thanks,
Happy Go Lucky

##### Share on other sites
Quote:
 Original post by happygoluckyThe method I am using right now with VBOs is that I keep a copy of all vertex data on the client side, and allocate a VBO and call glBufferData() for each of them. If the vertex gets dirty by skinning, it gets re-uploaded to the GPU using glBufferData(). Thus I am not using glMapBuffer() in my code.
Never, never use BufferData to upload data!
The only thing for which you should call BufferData is to allocate memory by passing it a NULL pointer.

Therically, BufferSubData should be much faster (and it is here).

BufferSubData means "copy". Note it lacks for example the usage description. BufferData means "create (and maybe copy)", it has a "size" parameter but BufferData is allowed to redefine the buffer size so what may happen is that the driver checks its internal tables for a match... or it may round-trip to the HW, or worse...
Quote:
 Original post by happygoluckyRight now it's only when a specific object gets rendered that it gets refreshed to the GPU if needed. This has for consequence that glBufferData() gets called almost right before glDrawElements(). Would it be faster to actually upload all the dirty buffers to the GPU before starting rendering? Maybe it would somewhat reduce some bottlenecks in the driver?
Yes and no. It wouldn't go magically faster but...
Yes, it will provide a speedup if you can do something other in the meantime.
In the case of the examples, if there's a single object then there would be no gain... for multiple buffers it could be one as the rendering could theorically carry on while data is uploaded to other buffers... In general however, you should have multiple objects in a buffer until some metric is satisfied.

##### Share on other sites

I took your comments into consideration. So now what I do is:

// Init buffers.glGenBuffers();glBindBuffer();glBufferData(size, data pointer, usage);while(true){    for (each object to render)    {        if (object dirty)            glBufferSubData();        glVertexPointer();        glNormalPointer();        etc.        glDrawElements();    }}

However, I get exactly the same level of performance. From this I can deduct that the driver actually detects that I am stupidly reallocating for same size and usage, so it reuses the one already created.

My characters are subdivided in several sub meshes, so I guess the next strategy would be to call all the glBufferSubData() at the beginning of the render, and hope that the first buffers will have made it to the GPU before it starts rendering them.

##### Share on other sites
Yes, sounds reasonable. However you should have a metric of the involved complexity. If you're interleaving just a few draw calls of a thousand vertices each, it's a bit unlikely you'll se a decent gain (even when the FS is really complex).

##### Share on other sites
No, you should use glBufferData instead of glBufferSubData.
There is a good reason for this and it is recommended by a pdf from nVidia (or ATI, don't really remember).

glBufferData tells the driver you will be refreshing the entire buffer so a new one will be allocated while the GPU sources the other one.

##### Share on other sites
!! SUCCESS !!

I was right, the strategy is all about where and when the calls are made. What I did is call glBufferSubData() (glBufferData() would have worked as fast if the implementation of GL is good enough not to reallocate) as soon as the skinning of the Vertexbuffer on the CPU side is finished.

When the mesh gets rendered, a lot of CPU time has elapsed since, so the transfer to the GPU is complete by then. So there is no waiting time uploading to the GPU.

here is my algorithm:

// Init everythingglGenBuffers();glBufferData();// [...]// Main Loopwhile (true){    // Process Skinning    for (each object skinned)    {        DoSkinning(object);        glBufferSubData(object);    }    // Do more stuff on CPU side if any.    // [...]    // Render the Scene    for (each object to render)    {        glDrawElements();    }}

I believe that actually if you dont really have anything to do on the CPU between the skinning step and the rendering step, it does not really matter. What matters more is that all Vertex Data uploading is done at one shot before starting to render, so the first vertex buffers will have made it to the GPU by the time you call the first glDrawElements(). This way the waiting time for the GPU to receive the new vertex data is reduced a big deal.

After doing this, the performance with or without skinning active is the same, which is about 16 to 20ms compared to 30 to 35ms with vertex arrays, or 40 to 45ms with VBOs badly used.

-HappyGoLucky

##### Share on other sites
I had the same thing happened to me while i was implementing a terrain algo in my engine.

Performance was shit house if i called the update function (which uploaded the index data to the gpu) right before calling the actual render function. If i called the update function early on for each frame pass and let the cpu do some other stuff before calling the render function the performance increase was quite large.

Just thought i would pass this on anyway so you know your case was not one of those isolated flukes LOL

1. 1
2. 2
3. 3
Rutin
16
4. 4
5. 5
JoeJ
12

• 10
• 9
• 14
• 10
• 25
• ### Forum Statistics

• Total Topics
632646
• Total Posts
3007637
• ### Who's Online (See full list)

There are no registered users currently online

×