Jump to content
  • Advertisement
Sign in to follow this  
Niwak

Vertex/Index buffer policy

This topic is 4588 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have a working engine and I would like to improve its geometry VRAM management. I have made the following attempts ; 1. let my meshes use shared memory block and allocate a VBO for each memory block (i.e. the user defines the scope of the VBO), 2. allocate a VBO per mesh, 3. allocate a VBO per frame, fill it with the frame data, reuse it the next frame, eventually creating another VBO for data not fitting into it, reorganize the chunk depending on a heuristic (number of free chunk in the VBO, number of VBO per frame, total used size of VBO). Generate indices according to this allocation scheme. Each of these have some pros and cons ; 1. pro : very versatile con : let the engine user specify everything without knowing how the engine will handle the data 2. pro : very easy to implement, nearly no buffer management overhead con : rather unefficient 3. pro : works very well with static scene con : some overhead for buffer management, doesn't work that well with dynamic scene I was wondering which geometry VRAM management policy you are using with your engine. Any clue would be greatly appreciated. Thanks Vincent

Share this post


Link to post
Share on other sites
Advertisement
Hi...

Just some thoughts.
From what i've read on the subject, i know that :
1) Binding a vbo takes almost no time. Of course don't start binding buffers without a reason, or you will see a performance drop.
2) Changing the pointers (vertex, texcoord, etc.) for a vbo is the heavy thing.

I haven't done any benchmarking myself, so i can't comment on what is true or false. I'm using static vbos for all static meshes (1 vbo per mesh) and regular VAs for animated meshes (if animation is going to be computed on CPU).

About your suggested ways of do it. I think 3 is going to be slow, because if you declare you vbo as STREAM_DRAW, the driver might allocate it in system or AGP memory, instead of video memory. And if you declare it as STATIC_DRAW, and the driver allocates the buffer in video memory, the memory transfer may be slow if done every frame.
About 2, as i said this is what i'm using. I have no particular reason for that, except as you said it is very easy to implement.
Finally, in terms of performance, i think 1 is going to be the same as 2, because if we assume that binding a buffer takes zero time, you still have to specify the pointers multiple times for multiple meshes. This would made sense when VAR was the only way to use video/AGP memory, and the state change was too heavy to be done multiple times per frame.

I'm also interested in others opinions on the subject, because as i said my implementation is rather simple.

HellRaiZer

Share this post


Link to post
Share on other sites
Just a small note on the third solution to make me understand better ;

The buffers are declared as static and therefore allocated in VRAM.

The main advantage of solution 3 is that it allows very good batching ; when generating the indices, I generate them per stateId and therefore reduce the number of DrawPrimitive calls. Since lots of datas are static and remains from one frame to the other, the update process (generating indices, grouping by stateId) is only triggered for a few meshes (the dynamic ones and the ones entering the render frame).

Anyway, I find this solution difficult to implement efficiently ; there are a lot of documentation on memory manager (solution 3 is a classic memory manager with a heuristic dedicated to 3d engine) around the internet but I did not find any dedicated to VRAM management.

I would add that I did not obtain any serious mesurable improvement when implementing this but this may be linked to the fact that I did not optimize my implementation and therefore get a bunch of overhead for creating the batches and managing the VBOs.

Did someone tried a different approach ? I would be very interested.

Share this post


Link to post
Share on other sites
i have a vbo for each unique mesh. duplicate meshes sare the same buffer.
This means each mesh can have its own unique vertex format.

For any dynamic geometry i dont use vbo's. but then again i try to avoid dynamic geometry as much as possible.

Share this post


Link to post
Share on other sites
Quote:

The main advantage of solution 3 is that it allows very good batching ; when generating the indices, I generate them per stateId and therefore reduce the number of DrawPrimitive calls. Since lots of datas are static and remains from one frame to the other, the update process (generating indices, grouping by stateId) is only triggered for a few meshes (the dynamic ones and the ones entering the render frame).


Do you mean that if you have two identical meshes (e.g. cubes) at different positions, the vb will hold the geometry for both of them? Unless those meshes are going to be static all the time, there is the performance penalty of doing the transformations on the CPU.
Otherwise i can't find a way to minimize the number of primitives to be drawn.

Another problem with the big vb is that you will end up using indices out of the [0, 65535] range (unsigned short) which may have a greater impact in performance than multiple draw calls. But this depends a lot on your geometry resolution. But if for example we take a simple effect like dot3 normalmapping that most games tend to use these days for nearly all objects, then the vb that will be used for this state id will be large, and you are going to have indices out of the unsigned short range. This holds if you are going to set the pointers only once per vb.

If you are going to set the pointers multiple times per vb, in case to keep the indices low, you'll end up with something as fast as 1 and 2 (with the overhead of maintaning such structure), because the only difference is that you don't bind a buffer too often, which is a fast operation.

When you are saying stateID, are you refering to shader data too (i.e. a material) or the shader id only? If you are refering to shader only, then it is easy to exceed the [0, 65535] range. If you are refering to a material (stateID based on shader, shader data and textures), then i think it will be better to use a single vb for every one stateid and loose the overhead of maintaining the whole thing every frame.

When i said that i'm using a vb for every mesh, i ment a vb for every unique mesh. E.g. If i have to render 1500 identical cubes and 2000 identical spheres, i generate only two buffers. One for each mesh. Of course for the animated (skinned) models where the animation may depend on the current time from mesh to mesh, i keep a seperate vb for each instance.

Hope that helps

HellRaiZer

Share this post


Link to post
Share on other sites
StateId is in fact a "renderer state id" which uniquely identify a "renderer state" which is an object which describes the context a node should take in account for rendering (transformations, other nodes influencing this one like lights, environment definition nodes,...). Therefore, creating the batch only consist in aggregating the data together.

My test were only with rather small meshes and they did not exceed the 65535 limit for word indices. I did not think about this problem. Thanks for noting it.

I think you are quite right saying the 3rd solution will not lead me to any real optimization. For the moment, I will stick with the 1st or 2nd solution and postpone any optimization on this side.

Thanks

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!