#### Archived

This topic is now archived and is closed to further replies.

# VBO (Vertex Buffer Object) Performance Issue

This topic is 5025 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Currently, rendering via compiled vertex arrays is about 3x times faster than using VBO's. I have not noticed any graphical glitches using VBO's, though stability is also influenced as it also randomly crashes with VBO's enabled. If anyone could offer some hints on why the performance is so low I would greatly appreciate it. The PC specs: Athlon XP 3200+ 1024MB RAM Radeon 9800 Pro DDR2 256mb Catalyst 4.3 Here is the initialization code:
r_numVertexBufferObjects = VBO_ENDMARKER + glConfiguration.maxTextureUnits - 1;
if( r_numVertexBufferObjects > MAX_VERTEX_BUFFER_OBJECTS )
r_numVertexBufferObjects = MAX_VERTEX_BUFFER_OBJECTS;

glGenBuffersARB( r_numVertexBufferObjects, r_vertexBufferObjects );

for( i = 0; i < r_numVertexBufferObjects; i++ ) {
if( i == VBO_INDEXES )
glBindBufferARB( GL_ELEMENT_ARRAY_BUFFER_ARB, r_vertexBufferObjects[i] );
else
glBindBufferARB( GL_ARRAY_BUFFER_ARB, r_vertexBufferObjects[i] );

if( i == VBO_VERTS ) {
glVertexPointer( 3, GL_FLOAT, 0, 0 );
} else if( i == VBO_NORMALS ) {
glBufferDataARB( GL_ARRAY_BUFFER_ARB, MAX_ARRAY_VERTS * sizeof( vec3_t ), NULL, GL_STREAM_DRAW_ARB );
glNormalPointer( GL_FLOAT, 12, 0 );
} else if( i == VBO_COLORS ) {
glBufferDataARB( GL_ARRAY_BUFFER_ARB, MAX_ARRAY_VERTS * sizeof( byte_vec4_t ), NULL, GL_STREAM_DRAW_ARB );
glColorPointer( 4, GL_UNSIGNED_BYTE, 0, 0 );
} else if( i == VBO_INDEXES ){
glBufferDataARB( GL_ELEMENT_ARRAY_BUFFER_ARB, MAX_ARRAY_INDEXES * sizeof( int ), NULL, GL_STREAM_DRAW_ARB );
} else {
glBufferDataARB( GL_ARRAY_BUFFER_ARB, MAX_ARRAY_VERTS * sizeof( vec2_t ), NULL, GL_STREAM_DRAW_ARB );

GL_SelectTexture( i - VBO_TC0 );
glTexCoordPointer( 2, GL_FLOAT, 0, 0 );
if( i > VBO_TC0 )
GL_SelectTexture( 0 );
}
}

glBindBufferARB( GL_ARRAY_BUFFER_ARB, 0 );

Here is the rendering code:
if( r_enableNormals ) {
glEnableClientState( GL_NORMAL_ARRAY );
glBindBufferARB( GL_ARRAY_BUFFER_ARB, r_vertexBufferObjects[VBO_NORMALS] );
glBufferDataARB( GL_ARRAY_BUFFER_ARB, numVerts * sizeof( vec3_t ), normalsArray, GL_STREAM_DRAW_ARB );
glBindBufferARB( GL_ARRAY_BUFFER_ARB, 0 );
}

GL_Bind( 0, r_texPointers[0] );
glEnableClientState( GL_TEXTURE_COORD_ARRAY );

if( numColors > 1 ) {
glEnableClientState( GL_COLOR_ARRAY );
glBindBufferARB( GL_ARRAY_BUFFER_ARB, r_vertexBufferObjects[VBO_COLORS] );
glBufferDataARB( GL_ARRAY_BUFFER_ARB, numVerts * sizeof( byte_vec4_t ), colorArray, GL_STREAM_DRAW_ARB );
glBindBufferARB( GL_ARRAY_BUFFER_ARB, 0 );
} else if( numColors == 1 ) {
glColor4ubv( colorArray[0] );
}

for( i = 1; i < r_numAccumPasses; i++ ) {
GL_Bind( i, r_texPointers[i] );
glEnable( GL_TEXTURE_2D );
glEnableClientState( GL_TEXTURE_COORD_ARRAY );
}

glBindBufferARB( GL_ELEMENT_ARRAY_BUFFER_ARB, r_vertexBufferObjects[VBO_INDEXES] );
if( glConfiguration.drawRangeElements )
glDrawRangeElementsEXT( GL_TRIANGLES, 0, numVerts, numIndexes, GL_UNSIGNED_INT, 0 );
else
glDrawElements( GL_TRIANGLES, numIndexes, GL_UNSIGNED_INT,	0 );
glBindBufferARB( GL_ELEMENT_ARRAY_BUFFER_ARB, 0 );

[edited by - Gumpngreen on April 13, 2004 3:23:19 PM]

##### Share on other sites
The driver might need some vbo optimization. I''ve read lots of folks complain about the speed. From what I gather, one should create lots of small buffers in gl and the opposite in d3d. This is because d3d switches between kernel and user space modes frequently which kills speed, so less switching by having deeper vbs is the way to go. Since gl doesn''t take a hit as much, you can switch more often and keep the verts in the internal caches, or something like that. I think the vbo speed should be improved in 56.x drivers from what I read in release 55 of nv driver docs.

##### Share on other sites
For what its worth I have experienced similiar issues with VBO''s and eventually decided to avoid them.

##### Share on other sites
Forget the last reply. The bug is quite simple. You are NOT using VBO. Plus, you are sending data to graphic card(glBufferDataARB) each frame. Look at the end of extension specification for some examples of use (or some tutorials). In general it goes something like this.

init:
-create buffer for each obect (glGenBuffersARB). If you have static data use GL_STATIC_DRAW_ARB
-fill buffer with vertex data glBufferDataARB (just once!)
(same for indices)

render:
-bind VBO
-set vertex pointers using offset
-render
(optional : bind VBO 0 to disable VBO)

You should never let your fears become the boundaries of your dreams.

##### Share on other sites
As I said, for what its worth.

DarkWing, do you have any experience with the effects of VBO''s with fairly substantial datasets? I was experimenting with a dataset that was ~27 megs large. It ended up going a bit ( not significantly ) slower than using exclusively verex arrays.

##### Share on other sites
quote:
Original post by haro
DarkWing, do you have any experience with the effects of VBO''s with fairly substantial datasets? I was experimenting with a dataset that was ~27 megs large. It ended up going a bit ( not significantly ) slower than using exclusively verex arrays.

I''ve played with VBO size when it was released, becouse VBO bind was quite expensive. I remember there was a limit to the size of single buffer, but I don''t remember exact size. After you reached the limit VBO droped back to AGP memory(=SLOW). But the good thing is that cost of binding a VBO is getting very small so that having lots of small buffers is not a problem anymore. Now I use mostly small buffers (<1mb). one thing to remember is that VBO will not speed up your rendering (much) if you are not transfer bound.

You should never let your fears become the boundaries of your dreams.

##### Share on other sites
27 Megs is not a "fairly substantial dataset", it's a small dataset. At work i have datasets that are in the Gigs range.. but that's off-topic.

Tom Nuydens from delphi3d.net has a demo with occlusing culling in a medium dataset, around 200 Mb of data, all put in static VBOs. It runs fine on a Radeon 9700+ (more than 50 MTris/sec), maybe you should check it out if that's interesting you.

Y.

[edited by - Ysaneya on April 14, 2004 3:10:45 PM]

##### Share on other sites
EDIT: Flame - cut.

[edited by - haro on April 14, 2004 6:44:18 PM]

##### Share on other sites
Ysaneya: The point was how much data you can put in one VBO before suffering from trashing/swapping. As far as I can remember (it was a long time ago) Tom's demo uses a bunch of "small" buffers.

haro: No need to start a childish flame war here... Size of dataset is relative. 1m vertices may be alot for Quake3 level but is(will be) very little for Unreal3.

This topic is getting way off-topic. Just try to help OP (Gumpngreen) before flaming on...

You should never let your fears become the boundaries of your dreams.

[edited by - _DarkWIng_ on April 14, 2004 5:28:59 PM]

##### Share on other sites
clarification: as an author of this code, I can tell you that:
1) static VBO''s are not used at all (they don''t work very well in q3a environment)
2) calling glBufferDataARB each time you want to draw something is required (infact, glBufferDataARB is a lot cheaper than glVertexPointer)

##### Share on other sites
Sorry, Anonyomous Poster above is me

##### Share on other sites
Haro: I''m not really sure why you''re so agressive, i was just trying to help. Yeah it was off-topic and it was a personal opinion, no need to flame me so fast.

Darkwing: Tom''s demo is storing a few thousand polys per vbo, i think, but the total is still weighting > 200 Mb of data, all put in static VBOs.

The way ATI cards handle this is very different than NVidia''s. ATI is first allocating in video memory, then in AGP memory, then in system memory if it runs out of space. Once allocated, a VBO will stay in the pool of memory it was created from, and never move. In practise this means that if you''re allocating your VBOs in a specific order (spatially), say, from left to right in your world space; and if in your view you''re only displaying the right objects with frustum culling, you might potentially be rendering the VBOs from system memory. I tested this and it was confirmed by ATI a few months ago, but i''m not sure if they''ve changed or not their "memory management philosophy" in their latest drivers.

Under the same test, NVidia cards seemed to behave much better. They seemed to use an LRU cache but i got no confirmation from them. As long as the amount of data you''re rendering per frame is low enough, compared to your video memory size (and even if the total amount of data in your scene is very high), they performed well.

Y.

##### Share on other sites
quote:
Original post by _Vic_
2) calling glBufferDataARB each time you want to draw something is required (infact, glBufferDataARB is a lot cheaper than glVertexPointer)

That isn''t true. BufferDataARB() requires the driver to copy the entire buffer worth of data into a driver controled memory(pcigart, agp, or video). The execption to this would be if the data pointer is null, but in that case the data in the buffer is undefined until you later load it with valid data. VertexPointer() on the other hand is fairly cheap since the driver only has to change a few pointers. What you said would be equivalent to saying it''s faster to do a texImage2d() everytime you want to use a new texture because bindTexture() is too expensive.

Ysaneya that is the way ATI''s vbo buffers used to work a very long time ago, but I would say things haven''t been that way in way over 6 months. Now ATI puts buffers into agp and local and if you run out of space the driver will bump out unused textures or object buffers and put the current one in there.

Here is a performance optimization I haven''t seen mentioned elsewhere. On buffers that you update, if you update them with bufferDataARB() then you should try to keep the size the same as last time you used the buffer. The reason you want to do this is because constantly changing the size will lead to memory fragmentation and waste software time doing allocs. Think about it, you wouldn''t want your app to constantly allocate new big chuncks of memory in performance paths, and that is what you''re asking the driver to do if you keep changing the size of an object buffer.

##### Share on other sites
quote:
Original post by Ysaneya
Tom''s demo is storing a few thousand polys per vbo, i think, but the total is still weighting > 200 Mb of data, all put in static VBOs.

Yeah, I know. Creating a zilion of small buffers is not a problem. But creating one big(200mb) chunk is.

@mribble: optimization you described and a few others, were posted in one of NV optimization quides.

You should never let your fears become the boundaries of your dreams.

##### Share on other sites
to quote NVidia doc "Using VBO''s":

Avoid calling glVertexArray (I think it should be glVertexPointer, but it doesn''t matter)
The glVertexArray function does a lot of setup in VBO, so to avoid redudancy, avoid calling it.
You might think the essentials of VBO management are done in glBindBufferARB, but it''s the opposite. VBO systems wait for next upcoming important function (like glVertexArray).

All of the above was proven to be true in my researches, that''s why my program calls glVertexPointer only once at startup

##### Share on other sites
quote:
Original post by _Vic_
All of the above was proven to be true in my researches, that''s why my program calls glVertexPointer only once at startup

And calling glBufferDataARB each frame? If so you are totaly missing the whole point of static VBOs. You are sending data to graphic card each time. And the argument of speed in that document is glVertexPointer vs. glBindBufferARB, not glVertexPointer vs. glBufferDataARB as you misunderstod.

You should never let your fears become the boundaries of your dreams.

##### Share on other sites
As I said, I''m not using static VBO''s at all (too much dynamic geometry in q3a)

##### Share on other sites
quote:
Original post by _DarkWIng_
@mribble: optimization you described and a few others, were posted in one of NV optimization quides.

I just reread part of it and it does hit at such a fact. Guess I missed that the first time I read thought it awhile back. It does annoy me that they seem to try and pass off their implementation details off as the way things always are. A good example of this is when they say passing in a size of zero to bufferDataARB will free the bound surface. I can tell you that is sometimes not the case with ATI''s ogl drivers. So while there is a lot of good info in that document, be careful not to accept it all as gospel.

_Vic_ if you''re only using the data once you won''t get a benifit from using vbo. However if you''re using it a few times (either for multipass or because it''s persistant over a few frames) then vbo makes sense.

##### Share on other sites
quote:
Original post by mribble
I just reread part of it and it does hit at such a fact. Guess I missed that the first time I read thought it awhile back. It does annoy me that they seem to try and pass off their implementation details off as the way things always are. A good example of this is when they say passing in a size of zero to bufferDataARB will free the bound surface. I can tell you that is sometimes not the case with ATI''s ogl drivers. So while there is a lot of good info in that document, be careful not to accept it all as gospel.

Yeah.. Getting VBO right can sometimes still be tricky. Both nVidia and ATi have their ups and downs in their drivers so you have to be carefull about it (big buffers, small buffers, strategys,...). Even thow it''s ARB extension you should try it on both cards once in a while to avoid unxpected errors or slowdowns.

You should never let your fears become the boundaries of your dreams.

##### Share on other sites
What''s the reason for including glGetBufferPointerv()? Is it so that multiple clients can get the mapped pointer since glMapBuffer() returns with error if called more than once? Just checking.

##### Share on other sites
Vic:

"All of the above was proven to be true in my researches, that''s why my program calls glVertexPointer only once at startup"

Which works, if you only have a single VBO format. If you use different VBO formats (for instance, if your first VBO contains data with coords, UV, and colors, and your second VBO contains only coords), then you will need to set up your vertex pointers whenever you switch VBOs.

I would love to be wrong, but I think DarkWing has it right.

##### Share on other sites
I would not mix different types in one VBO just for the clarity (plus, you get several smaller VBO''s and this case seems to be better than one huge VBO)

##### Share on other sites
BrianL:

Paper: http://developer.nvidia.com/attach/6115

I haven''t implemented VBOs until now, but as I understand it, nvidia say in their paper, that you have to set your vertex array pointer just one time for each vbo. So you create your vbo, bind it, fill it with data (if its static) set up your vertex arrays. You can just refill your vbo (if its dynamic) without changing or resetting your pointers. As I unterstand, you don''t have to set the pointer the next time you bind your vbo, as it uses the last settings you made with that vbo. So you could just create different vbos with different vertex formats.

PS: Sorry, my english isn''t that good.

Bye,
Oesi

##### Share on other sites
how compatible with ATI''s drivers is that? that is the question...

##### Share on other sites
Sorry about the ambiguity, by ''mix'', I meant multiple VBOs, not putting multiple type of data in a single VBO (I am just starting out with graphics too!)

Oesi: I made the same assuption you did when I started with vertex buffers. I assumed that the pointers/offsets were associated with the buffer. Once I started getting crashes when I used multiple VBOs with different offsets.

When I thought about the fact that the rest of GL is a state machine, it seemed to make sense that the VBO offsets would be as well. This would mean that the pointers would have to be reset whenever a VBO with a new format was ''used''.

Again, this is all just a hunch based on my experience with them.