# Terrible VBO performance

This topic is 4724 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Now that my VBOs are finally working (thanks again everyone), I have a question about their performance. If I run my program using display lists or plain old vertex arrays, it runs pretty fast. The display lists are faster than the VAs, but not by much and both look pretty good. However as soon as I enable the VBOs in a larger object, things start crawling. I go from something like 30 frames per second to a frame every 1 or 2 seconds. Someone suggested it might be the size of the objects that are killing the performance, and going back and reading all of my old posts about VBOs, that seems to be a general consensus: VBOs don't like super big objects. To further support that theory, the VBOs work fine (no better, no worse than the display lists) on the two smaller objects I have. I tried it on a 36 polygon box and a 3800 polygon cow (yes, a cow). However as soon as I moved it up to a model with about 35,000 polygons, it starts crawling. Now I know all of the evidence points to the size of the objects, and worry not fellow board crawlers, while you're reading this very post I shall be splitting my object up into about 8 different VBOs to see if that helps, but the thing that gets me is that the vertex arrays have no problem with objects that size. Isn't a VBO just like a VA only with the data already stored in speedy VRAM, AGP transfer free? So shouldn't things get faster and not slower? The video card I'm working with isn't the greatest, an MX440 with only 64MB of RAM, but is it even remotely possible that I'm actually using that much memory? The file for the biggest object is only about 5MB. So my question, while I attempt the obvious solution, is would there be anything, other than the size of the object, that would be adversely hindering the legendary performance of the VBO extension? And the other question is, why can the VA's handle it just fine while the VBO's are dying? And as a matter of fact, don't display lists store the data in VRAM as well? The display lists run even better than the VAs (the object's geometry is static so that's not too unbelievable), with the same object stored in their memory as the VBOs. Hmm....compelling. Thoughts?

##### Share on other sites
That *is* pretty strange.
I just wanted to let you know that 'size doesn't matter'.
I use VBOs for objects containing about 17k triangles.

how are you setting up your VBOs?

##### Share on other sites
Two things come to mind from my own experience.
1 - Are you using mipmapping?
2 - Are you clearing the entire display using glClear(GL_COLOR_BUFFER_BIT...), or are you just clearing the depth buffer?

If you use mipmapping and only clear the depth buffer, you'll find there's a reasonable speed boost in comparison to rendering things using full-detail textures and clearing the entire display. At any rate, it seems strange that you're getting 1-2 frames per second, considering I get the same speeds on my GeForce 2 mx-200 (64MB, like yours) when I'm rendering 2093058 triangles using vertex arrays, and when I cut it down to ~32k triangles, it runs perfectly (by my standards)...

##### Share on other sites
Although size generally might matter once you get over the 'unsigned short' mark (videocards are generally optimal with unsigned short as index type) you're not at that mark yet.

##### Share on other sites
Hmm, it can't be size, because i have rendered well over 300,000 triangles at 90-110 fps, using VBOs. Can you post some relevent code? I doubt this, but i must ask, are you initialising the extention every frame?

EDIT: typos

[Edited by - DerAnged on March 16, 2005 11:17:01 AM]

##### Share on other sites
Thank you for your responses everyone.

Quote:
 I just wanted to let you know that 'size doesn't matter'.I use VBOs for objects containing about 17k triangles.

Cool, though I don't run into trouble until I try the object with 32,000 triangles. That's the third smallest object I have, the others at 36 and 3800.

Quote:
 1 - Are you using mipmapping?2 - Are you clearing the entire display using glClear(GL_COLOR_BUFFER_BIT...), or are you just clearing the depth buffer?

1 - Nope
2 - Yeah, I clear the entire display each frame: glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT)

Disabling the clear on the color buffer doesn't help any (and really trips out my display).

Quote:
 Although size generally might matter once you get over the 'unsigned short' mark (videocards are generally optimal with unsigned short as index type) you're not at that mark yet.

Actually I kick the crap out of the unsigned short limit on most of my models. Only a handful are sub 65,000 triangles, so I'm using unsigned ints. Still, the slowdown takes place at 32,000 (and probably lower), which is well within the range of an unsigned short (thought it's still being called with an unsigned int).

Quote:
 Hmm, it can't be size, because i have rendered well over 300,000 triangloes at 90-110 fps, using VBOs.

That's certainly good to know. I might hit that range eventually.

Quote:
 Can you post some relevent code?

VBO setup (done at initialization):
eglGenBuffersARB(2, buffList);	pglBindBufferARB(GL_ARRAY_BUFFER_ARB, buffList[0]);pglBufferDataARB(GL_ARRAY_BUFFER_ARB, sizeof(struct _Vertex) * numVerts, vertList, GL_STATIC_DRAW);pglBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, buffList[1]);pglBufferDataARB(GL_ELEMENT_ARRAY_BUFFER_ARB, sizeof(unsigned int) * numInd, index, GL_STATIC_DRAW);

Rendering:
pglBindBufferARB(GL_ARRAY_BUFFER_ARB, buffList[0]);glNormalPointer(GL_FLOAT, sizeof(struct _Vertex), BUFFER_OFFSET(0));glVertexPointer(3, GL_FLOAT, sizeof(struct _Vertex), BUFFER_OFFSET(12));pglBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, buffList[1]);glDrawElements(GL_TRIANGLES, numInd, GL_UNSIGNED_INT, BUFFER_OFFSET(0));

Quote:
 I doubt this, but i mus ask, are you initialising the extention every frame?

Heh, nope.

When would someone use GL_STATIC_READ, by the way? I figured it'd be a good thing to use for the index VBO as it doesn't contain actual geometry. And what about the GL_DYNAMIC_DRAW and whatever the other one is? Is that for if your vertex array's values are going to change (like with animation)?

Thanks for everyone's help so far. It's pretty much agreed upon that this is not a size thing, then?

##### Share on other sites
GL_STATIC_READ would be used if you planned to readback the infomation, this could cause the driver to keep the data in AGP or even system ram to allow for the readback, so probably not advised.

A quick look over the spec. gives that GL_STATIC_DRAW is probably your best bet, as you'll not be reading back data and not respecifying it... you could try GL_STREAM_DRAW, however thats basically the same thing

##### Share on other sites
What graphics card?
How's your speed with just vertex arrays?

##### Share on other sites
Quote:
 Original post by DerAngedWhat graphics card?How's your speed with just vertex arrays?

GeForce MX440 64MB RAM

Display lists are currently the fastest but not by much. Vertex arrays are almost as fast as the display lists. Hardly noticably slower. Like 25-30 frames per second, I would guess.

##### Share on other sites
Quote:
Original post by CyberSlag5k
Quote:
 Original post by DerAngedWhat graphics card?How's your speed with just vertex arrays?

GeForce MX440 64MB RAM

Boom, i had numorous problems with that card! This is why mine have a test, if VAs are faster than VBOs it usees hem instead, just becasue of my experiance with that card.

##### Share on other sites
Quote:
Original post by DerAnged
Quote:
Original post by CyberSlag5k
Quote:
 Original post by DerAngedWhat graphics card?How's your speed with just vertex arrays?

GeForce MX440 64MB RAM

Boom, i had numorous problems with that card! This is why mine have a test, if VAs are faster than VBOs it usees hem instead, just becasue of my experiance with that card.

No kidding? I'll swap it out for an FX5200 after lunch and see if that helps. The system that this code is going in will be sporting a ti4800, so a 5200 shouldn't be too far off.

Thanks DerAnged.

##### Share on other sites
What kind of 5200 is that, because ive heard some bad things about some, but my 5200 Ultra(PNY) has no problems. PArticularly, ive heard that the chaintech 5200, is just a ti with shader support, well doesn't really matter, anything is better than that card you have in now.

##### Share on other sites
Quote:
 Original post by DerAngedWhat kind of 5200 is that, because ive heard some bad things about some, but my 5200 Ultra(PNY) has no problems. PArticularly, ive heard that the chaintech 5200, is just a ti with shader support, well doesn't really matter, anything is better than that card you have in now.

Heh, I think they're chaintechs. Oh well, I can always attempt to steal something better if that doesn't work.

##### Share on other sites
It should work just fine, actually if it doesn't i would have been completely wrong, and ... ... humiliated.

##### Share on other sites
The unsigned short limit refers to the number of vertices. If you're rendering triangles, this means you can only do about 20k triangles per batch. Especially on an older card like a GF4MX I wouldn't be surprised if this is what's causing the slowness. Using a triangle strip allows you to squeeze more triangles into a batch (if the data is good for stripping).

Also, in my experience it's not necessary to store the indices in a VBO, or VRAM in general. A list of indices (2-4 bytes per vertex) isn't much of a bandwidth hog compared to, say, sending vertex data over the bus (12-32+ bytes per vertex).

##### Share on other sites
Original post by Fingers_
The unsigned short limit refers to the number of vertices. If you're rendering triangles, this means you can only do about 20k triangles per batch.quote]

Oops, I was going to contradict what you were saying however it is in allignment with something I said earlier:

Quote:
 Original post by CyberSlag5kActually I kick the crap out of the unsigned short limit on most of my models. Only a handful are sub 65,000 triangles, so I'm using unsigned ints. Still, the slowdown takes place at 32,000 (and probably lower), which is well within the range of an unsigned short (thought it's still being called with an unsigned int).

We're both in error. You're correct in your first sentence "The unsigned short limit refers to the number of vertices." But if has nothing at all to do with the number of triangles. If I had only 3 vertices I could in theory draw 10,000,000 triangles (one on top off each other so they would essentially be the same triangle) as long as I could allocate enough memory for it.

The upper limit for the unsigned int is only in the number of vertices, as you said. Therefore, alot more than 20,000 triangles could be drawn, provided they share vertices (which in a mesh they all do).

##### Share on other sites
Alrighty. I swapped out that MX440 for an FX5200. The VBOs are much much quicker. However the Display lists still appear to be the fastest with regular vertex arrays coming in second and VBOs trailing just behind. All three are barely noticably different, and all three run decently. I should probably add some sort of benchmarking utility (perhaps tomorrow), but I'm rather pleased with the results of all three.

The thing that still bothers me, though, is that the regular vertex arrays are still slightly faster than the vertex buffer objects. That's so weird. There really shouldn't be any difference between the two except where the data is stored. Can anybody explain this? It's not that big a deal, I'll probly just end up using display lists (they are the easiest, after all), but I'd still like to know.

Thanks for everyone's support and input!

##### Share on other sites
Hmm here VBOs are up to 30fps faster than VAs, i think there is still a small bottleneck in your VBO code.

##### Share on other sites
Some points:

Most new drivers will optimise display lists very well, which is why if you dont know how to/cant get good performance from VBOs, stick with Display Lists for static data. Display Lists will always give optimal performance because the driver has full control over what it does with your data.

Always use unsigned shorts for indices. Even if you are passing in fewer than sizeof(unsigned short) vertices, use unsigned shorts. If you have higher vertices models, break them down.

##### Share on other sites
Quote:
 Original post by GamerSgMost new drivers will optimise display lists very well, which is why if you dont know how to/cant get good performance from VBOs, stick with Display Lists for static data. Display Lists will always give optimal performance because the driver has full control over what it does with your data.

Yeah, it's lookling like I'll just stick with display lists for this project. It was good to learn VA's and VBO's, though.

Quote:
 Always use unsigned shorts for indices. Even if you are passing in fewer than sizeof(unsigned short) vertices, use unsigned shorts. If you have higher vertices models, break them down.

I might give that a try, I need to look at what else needs to get done for this and by when.

Thank you for your response, GamerSg :)

##### Share on other sites
As long as the VA is all you're doing with OpenGL, and the model is 'small' enough that it can be sent over the AGP bus every frame, there is no reason VBO's should be faster. It's only when there is too much data to send that storing the data on the card would make a difference.

##### Share on other sites
1. Keep your indices in a system memory array. The MX doesn't support indices in video memory and glDrawElements requires read access to the index array. Your driver should automatically place indices in system memory when you use VBOs, but you never know..

2. Are you on an AGP or PCI bus ? Are you updating the VBOs dynamically at some point, or are they purely static ?

Y.