Memory-Array vs Dynamic-VBO: which is better?

Started by
27 comments, last by christian h 17 years, 6 months ago
Quote:Original post by RPTD
So if I have a preallocated VBO using glBufferSubData for the entire range is faster than glBufferData on a modern system?


glBufferSubData doesn't allocate. It updates an already existing buffer.
glBufferData allocates much like glTexImage2D.

If you lose performance with glBufferSubData, then it's not normal. Email the hw vendor. I'd be dissapointed if I lossed performance.

I was accidently calling glBufferData instead of the other but when I fixed it, it gave no improvement because other parts are keeping the GPU busy.
Sig: http://glhlib.sourceforge.net
an open source GLU replacement library. Much more modern than GLU.
float matrix[16], inverse_matrix[16];
glhLoadIdentityf2(matrix);
glhTranslatef2(matrix, 0.0, 0.0, 5.0);
glhRotateAboutXf2(matrix, angleInRadians);
glhScalef2(matrix, 1.0, 1.0, -1.0);
glhQuickInvertMatrixf2(matrix, inverse_matrix);
glUniformMatrix4fv(uniformLocation1, 1, FALSE, matrix);
glUniformMatrix4fv(uniformLocation2, 1, FALSE, inverse_matrix);
Advertisement
Quote:Original post by _neutrin0_
Yes the glBufferSubData is faster on modern hardware with the latest drivers. The map method was older and is generally a slower method. I have actually seen FPS drop because of mapping VBOs.


Thats news to me :o I tried pretty much all variations on GF6600GT and mapping was fastest one, in where you had to rewrite the data every frame. Faster than glBufferSubData, which was faster than VA's as expected.

ch.

Quote:Original post by RPTD
What do you mean exactly by "batching" in this context? ( just to see if I have something similar to compare results )


By batch I mean

Create vertex buffer.
Create index buffer

While( !gameEnd )
Load mesh 1 vertices into VBO.
Load mesh 2 vertices into VBO.
Load mesh 1 faces into IBO.
Load mesh 2 faces into IBO.
Draw mesh 1.
Draw mesh 2.

Destroy VBO
Destroy IBO

The VBO/IBO is statically allocated and not created and destroyed for every iteration. In fact it is not even resized. If the vertex buffer fills up, we draw the geometry and restart from the beginning of the buffer.

Ok, I managed to do some more tests and it seems that glBufferSubData is indeed faster than glBufferData.

The engine uses chunks of vertex data in a continuous array that it loads using calls to glBufferSubData. I tried replacing glBufferData here and it was slower. The reason was glBufferData would replace all the data in the array and maybe do some internal memory allocation/deallocation. The glBufferData will take a hit on performance as the size of the VBO increases (big hit). So for a big VBO it is best to use glBufferSubData.
++ My::Game ++
Quote:Original post by christian h
Quote:Original post by _neutrin0_
Yes the glBufferSubData is faster on modern hardware with the latest drivers. The map method was older and is generally a slower method. I have actually seen FPS drop because of mapping VBOs.


Thats news to me :o I tried pretty much all variations on GF6600GT and mapping was fastest one, in where you had to rewrite the data every frame. Faster than glBufferSubData, which was faster than VA's as expected.

ch.


When the Vertex buffer is small, glMapBuffer might be faster.
Have you tried your method on large chunks of data and big VBOs?

The reason I am asking is that the VBO document here (ref pages 12 and 13) says that the value passed to glMapBuffer is just a hint. glMapBuffer will "map" the data into system RAM. In worst case senario, the whole buffer might get mapped to the system RAM. For small VBOs it might not matter. If the VBO is large, then there could be a performance issue.

[Edited by - _neutrin0_ on October 9, 2006 2:10:44 PM]
++ My::Game ++
Quote:Original post by _neutrin0_By batch I mean

Create vertex buffer.
Create index buffer

While( !gameEnd )
Load mesh 1 vertices into VBO.
Load mesh 2 vertices into VBO.
Load mesh 1 faces into IBO.
Load mesh 2 faces into IBO.
Draw mesh 1.
Draw mesh 2.

Destroy VBO
Destroy IBO

The VBO/IBO is statically allocated and not created and destroyed for every iteration. In fact it is not even resized. If the vertex buffer fills up, we draw the geometry and restart from the beginning of the buffer.

I see. That matches up with some parts of my engine.

Quote:The engine uses chunks of vertex data in a continuous array that it loads using calls to glBufferSubData. I tried replacing glBufferData here and it was slower. The reason was glBufferData would replace all the data in the array and maybe do some internal memory allocation/deallocation. The glBufferData will take a hit on performance as the size of the VBO increases (big hit). So for a big VBO it is best to use glBufferSubData.

This makes sense. I'll do some testing tomorrow by replacing the calls at the appropriate places. For my models for example a VBO can easily reach 400K worth of dynamic data ( but the size stays the same all the time ).

Life's like a Hydra... cut off one problem just to have two more popping out.
Leader and Coder: Project Epsylon | Drag[en]gine Game Engine

Did now some testing and replaced one ( the large one ) VBO with the SubData call. Bumbed the framerate at the worst place in the map from 32 up to 40. Still a long way to go but that sounds much better.

Life's like a Hydra... cut off one problem just to have two more popping out.
Leader and Coder: Project Epsylon | Drag[en]gine Game Engine

I am just hazarding a guess here...

How big is your VBO? Very large VBOs may overflow the GPU memory especially if you are already loading large textures, other VBOs and a bunch of other data.

Worst case, OpenGL maps the VBO in system RAM.

Maybe reducing the the VBO size and sending the data in batches for very large meshes may help.

You need to verify this by actually doing it.
++ My::Game ++
if you are replacing a whole VBO you could do what is known in D3D circles as 'render and discard'

Basically, you fill you VBO with data, then when it comes time to update that VBO you rebind it and issue a glDataBuffer() call with NULL as the pointer to the data, then repeat the glDataBuffer() call with the pointer to the data pointing to the data to map into the buffer.

For both NV and ATI this sequence tells the driver 'I don't care about the data in that VBO, discard it when you are done rendering from it and create me a new buffer to put data in'.

I'm pretty sure this is covered in one of the performance pdfs in the Forum FAQ
@_neutrin0_:
As stated above somewhere the dynamic VBO for the player character homes in at about 400kb of size. This is more or less the size of the largest dynamic VBOs that I should come across.

@phantom:
Is re-allocating the memory by the driver ( through BufferData ) what eats time? I did now use the SubData to avoid this reallocation and it showed that it really reduces the processing time. It's just confusing me that it should be better the other way round.

Life's like a Hydra... cut off one problem just to have two more popping out.
Leader and Coder: Project Epsylon | Drag[en]gine Game Engine

Quote:Original post by RPTD
@phantom:
Is re-allocating the memory by the driver ( through BufferData ) what eats time? I did now use the SubData to avoid this reallocation and it showed that it really reduces the processing time. It's just confusing me that it should be better the other way round.


Yup. You need to discard the data by calling glBufferData and passing a NULL pointer to it.

++ My::Game ++

This topic is closed to new replies.

Advertisement