lots of VBO's vs just a few/one

Started by
9 comments, last by python_regious 19 years, 1 month ago
Hi all, ah, another question on VBO's (maybe we can sweet-talk someone into writing a VBO FAQ) at any rate my question (or actually several) 1)if the geometry is dynamic, what is the point of using VBO's then? a typical example is doing interpolation of a model between frames... is there really any gain by doing VBO's instead of vertex arrays? 2) is is wiser to have one HUGE VBO or several smaller VBO's I was considering the following system: For each frame of each mesh, give it a VBO[STATIC_DRAW] (then delete the vertex memory that I allocated) Then to draw, just bind the corect VBO(s) ... to do interpolation between frames, use a vertex shader... for deleting my vetex memory my thinking is that the openGL driver has the data of the vertexes stored somehow anayways... be it on the card or swapped to system memory, depending on the descision of the driver..... now is 2) a good idea? it has the advatage of being easy to do and to implement, and lets me free all that awful vertex memory and let the openGL driver handle it... would it be better to have one VBO per mesh? (and when I am to draw the correct framem, I fiddle with offset stuff) Best Regards
Close this Gamedev account, I have outgrown Gamedev.
Advertisement
1) maybe. In theory, the gfx card/drivers can transfer the data to faster ram while the card is doing something else (either direct to VRAM or from system to AGP), which will make the resulting processing faster. At worse you'll see no gain as the data will have to be transfered over the bus anyways (and you still might see some gain here as it will be a block transfer of all the data vs the GPU asking for it).
As for interpolation between model frames, in another post I put forward the idea of caching. Now, I dont know how practical this is, but as well as sorting by shader, texture and materials you also sort by frame. The idea being that if you've got aload of models between frame 1 and 2 and aload between frame 2 and 3, you upload frame 1 and 2, draw those, then replace frame 1 with frame 3, and draw those and so on. This means you are only transfering one frame of data over the bus in the best case and the worse case is no worse than normal VA array rendering (ie all the data has to be transfered). Ofcourse, this assumes you are doing your own memory management and streaming things into a few VBOs, if you do as you surguest (put all the data into VBOs for each frame and let the drive sort it out) then you've already got a gain from VBO over VA)

2) One huge VBO is bad and loads of small VBOs are also bad. You need a few mid-sized ones. As to what mid-size is, you've got me there [grin].

As for your idea, well it might well work, it really depends on how good the memory manager for the driver is and the size of each frame/VBO, so I'd try it and see [smile]. I wouldnt go for one huge VBO with all the frames in, as this can end up being HUGE and the drivers dont like huge VBOS.
hmm... maybe just one VBO per "mesh", and that VBO holds all the frames for that mesh.. would that be good at all?
Close this Gamedev account, I have outgrown Gamedev.
depends on the size. Keyframe'd animation takes up ALOT of space if there are alot of frames and you dont want to be trying to throw around VBOs of much more than say (and i pulled this out of my head, but i'm sure i've seen this number before) 6meg at a time.
do you mean 6megs per VBO or do you mean 6megs of total VBO? right now I am just loading/playing with old school quake2 and quake3 models, and they are not so big, an example: of the Xaero md3 model from quake3 gives me:

Lower.md3 has 3 meshes, of sizes 274, 13 and 25 vertices per frame (216 frames)
Upper.md3 has 3 meshes as well of sizes 71, 249 and 160 verts per frame (173 frames)
head.md3 has 1 mesh with 73 vertices (1 frame)
and the RailGun has 4(!) meshes of sizes 391, 32, 16, 14, 32 vetices per frame(1 frame)

so for this one .md3 model there is a total of:

67,392 vertices for the lower body --> 804KB
83,040 vertices for upper body -----> 996KB
73 vertices for head < 1KB
485 vertices for the gun <5KB

for the upper and lower bodies, are they about the right size to use as a VBO?
thus for that model I would have 2VBO's, each of about 1MB in size
(the head and gun I think I make a part of another VBO to hold all of the meshes that do not animate)

now I _intend_ to have about around 20 or so differnt models active at any time, so I am probably talking on the neighborhood of about 40MB worth of vertex data for all my models.... and then for the my world, I am not so sure how I will do that (yet),as I imagine that would be a pretty big size too....

Close this Gamedev account, I have outgrown Gamedev.
Quote:Original post by kRogue
do you mean 6megs per VBO or do you mean 6megs of total VBO?

6mb per VBO.

Quote:Original post by kRogue
now I _intend_ to have about around 20 or so differnt models active at any time, so I am probably talking on the neighborhood of about 40MB worth of vertex data for all my models.... and then for the my world, I am not so sure how I will do that (yet),as I imagine that would be a pretty big size too....

You will either have to do your own caching or let opengl do it for you.
You should never let your fears become the boundaries of your dreams.
Regarding the question of one vs. multiple VBOs, I've been doing a lot of tweaking and experimenting on this lately, trying to squeeze the last bit of performance out of our engine.

The results are very mixed. Basically, the performance behaviour is unfortunately extremely vendor and driver dependent. My current tests are done on a GF6800 ultra, using the 66.93 drivers. These are my conclusions for optimal performance:

* Adding many small VBOs will not impact performance too badly, except for the obvious increased overhead of the gl*Pointer() calls. This holds true, until some magical limit is reached, where performance will suddendly drop exponentially. The exact amount of this limit is very context dependent, but it usually started somewhere around 1500 to 2000 VBOs. So, keep it under that limit, if you can.

* Very large VBOs should be avoided, because they really impact on performance when they're swapped into fragmented AGP or VRAM. Delays of a second and more should be expected, if many large VBOs are in use, combined with large textures and a big framebuffer. Try to keep the size of a single VBO under 8 megs.

* I got the best performance readings on combining smaller meshes into medium sized VBOs, a little like you would combine image tiles into larger textures. The caveat are sync issues, when you want to update the geometry of one mesh within a larger VBO, so you have to be careful. From my tests, it seems optimal to combine 8 to 32 smaller meshes into a single VBO, without exceeding the 8 MB limit on the VBO itself.

* When updating data in a VBO, forget about mapping/unmapping it, even with a prior NULL call to glBufferData. The fastest way seems to be using glBufferSubData. The newest drivers are doing a good job avoiding sync issues here, if you don't write to overlapping areas.

* Creating and destroying VBOs on the fly isn't as bad as it used to be, as long as you don't do it every frame. My revised cache system now adapts to the geometry by creating new VBOs on demand, and recycling unused ones from time to time in the style of a garbage collector. The performance is surprisingly good.

The performance readings will be different on ATI. However, from my experience, ATIs VBO implementation is a lot more efficient and robust than nvidias', and a lot more tolerant towards abuse.
hmmm.. sounds like that I am fine just even if I have a fair number of small VBO's as for what I am doing I do not think I will need even 1000 unique VBO's.... so probbaly I'll just pack my data into the VBO's by "model piece" since chances are that if one of the meshes of that model need ot get drawn, they all will...

thanks to all for the advice!
Close this Gamedev account, I have outgrown Gamedev.
Quote:Original post by Yann L
* When updating data in a VBO, forget about mapping/unmapping it, even with a prior NULL call to glBufferData. The fastest way seems to be using glBufferSubData. The newest drivers are doing a good job avoiding sync issues here, if you don't write to overlapping areas.


I have 6600GT, and I was rendering single model that had about 14 submeshes and total number of ~14,000 polygons. I stored the each mesh into a VBO. I was just playing with all the options:
First I was updating the 's' coordinate on CPU (I was twiddling with toon-shading) and before this I mapped the buffers one-by-one to write-only and calculated the s-coord from information on RAM.
I also tried SubBufferData, but in my case, that was slower than mapping the buffer.. Except when I mapped 'em in read/write mode having no backup on RAM to read.. that was even slower than immediate-mode.
But that was just my case, and it also depends on the mode you'll lock the VBO.
Oh yeah, last test was to put the calculations to the shader without need to map the buf.. FPS went from 120 to 520.. not bad.
It's also good to give a hint about dynamic usage when creating the VBO.

ch.
Quote:Original post by kRogue
1)if the geometry is dynamic, what is the point of using VBO's then? a typical example is doing interpolation of a model between frames... is there really any gain by doing VBO's instead of vertex arrays?


If you use straight vertex arrays without VBOs, the driver must make sure that rendering works correctly even when you rewrite the contents of the vertex arrays directly after the DrawArrays/DrawElements call. There are two ways that it can do this:
1) Block in DrawArrays/DrawElements until the GPU has finished rendering. This is obviously very bad.
2) Create a temporary copy of the vertex arrays, and send that temporary copy to the GPU.

In theory, this additional temporary copy can be avoided using VBO.

cu,
Prefect
Widelands - laid back, free software strategy

This topic is closed to new replies.

Advertisement