• Advertisement
Sign in to follow this  

VBOs and polygon batching

This topic is 4249 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Consider the following scenario: An octree with hundreds of thousands of polygons and tens of textures is traversed and its visible parts are sent to the renderer. The renderer sorts these by shader and material to achieve optimal batching. The way I see it, for each batch, the geometric data (vertices/normals/texture coordinates) have to be pooled into a single buffer and the indices into another. Both are then dispatched for rendering. It can be assumed that the visible geometry will change on almost every frame so the buffers will need to be updated very often. Are VBOs the way to go for this kind of data transfer scheme? The reason I ask is that I recently switched from VAs to VBOs with GL_DYNAMIC_DRAW and got a performance decrease. I then tried GL_STREAM_DRAW and got a further slight decrease. Note that I have read the nVidia paper on VBOs and followed their advice on invalidating with glBufferARB(..., ..., NULL, ...) before glMapBufferARB. Can anyone shed some light on this?

Share this post


Link to post
Share on other sites
Advertisement
The design you describe forces the hardware to send the data to the GPU for each frame. This is not the way to go. If you just disable your pooling you should get a performance increase.

I think your octree system should look something like this (nothing to do with VBO) ;
- traverse your octree, collecting the visible nodes.
- sort the visible nodes by shader / texture / back to front ,...
- render the sorted visible nodes.
And the VBO policy should be something very simple like 'all the data for the static geometry of an octree cell go into a single VBO, each dynamic model has its own VBO'. The difficulty will be to balance between small octree cell size (good for culling) and big octree cell size (more VBO switch and draw call when rendering).

When I added VBO to my engine, the only situation where they were improving performance were when used with the STATIC_DRAW flag.

Share this post


Link to post
Share on other sites
Hi Niwak.

The problem is, as stated in the thread title, that batching is being performed. This means that the static geometry within an octree cell may be batched with that of other cells if they share properties and are both visible. Thus, they must exist within the same VBO (1 VBO per batch).

If they did not, then multiple draw calls would be required and that would defeat one big goal of the batching process.

Share this post


Link to post
Share on other sites
While batching is indeed important, its more todo with switching out shaders and textures than switching VBOs, while this operation isn't free if you can arrange the data so that you have a reasonable size being rendered then that cost will be masked.

So, in your Oct-tree example, if you do have static data which shares the same properties in various nodes then you'll want to group as much data together as is logical todo so and then uses index buffers to select the correct data

So, if your Oct-tree has 4 levels you might only want to split your vertex data for levels 0,1,2 & 3, with with the fourth level sharing a VBO vertex pool but having their own index pool, or even sharing an index pool, you then issue multiple glDrawRangeElement() calls to render the sectors you can see.

So, if you can see ALL of a level 3 node you could draw it all with one glDrawRangeElement() call, but if you can only see 3/4 of a 4th level cell then you'd have to issue two glDrawRangeElement() calls (one which deals with one half of the index data and one which picks up the rest).

You won't want to go two deep on your drawing oct-tree anyways due to batch sizes getting too small, but at the same time you won't want to go above 65K verts per batch either as thats the limit of a GL_UNSIGNED_SHORT which is the best index type to use.

Share this post


Link to post
Share on other sites
Quote:
Original post by phantom
You won't want to go two deep on your drawing oct-tree anyways due to batch sizes getting too small, but at the same time you won't want to go above 65K verts per batch either as thats the limit of a GL_UNSIGNED_SHORT which is the best index type to use.


Excuse my ignorance, I am always learning...
There is concrete upper bound (65K) for the batch but how small is a "small" batch?
Furthermore, if you indeed don't go too deep down the octree, will you not end up sending invisible parts of the geometry for rendering? Won't, say, expensive pixel shaders make this the bane of your performance?

Share this post


Link to post
Share on other sites
well, its not really a concrete upper bounds, just above that you need 32bit indices and some cards don't play too well with it.

At the lower end, there isn't a limit as such, just the smaller you get the more function call overhead plays a roll. From an oldish ATI doc however the point where it becomes CPU bound on a D3D app is less thnt 130tris per batch, but I wouldn't worry about having a few batches that small.

Sending data wont matter, the GPU will cull it and it will never get to the pixel shaders to execute, so its better to cut at a sane level and send a bit of extra geometery then to waste CPU time and memory bandwidth shuffling things around. Ofcourse, 'too deep' is a relative term and depends on what data you have to work with.

Share this post


Link to post
Share on other sites
OK, one last question:

"Shuffling things around" to pool the perfect batches might be expending CPU time and bandwidth, but it is minimizing the number of glDraw calls that are required.
Isn't that one of the main points of batching (the other being minimizing state changes)?

Share this post


Link to post
Share on other sites
yes, but you have to balance things, a few glDrawRangeElement() call with a VBO which is already driver side is going to cost you less then trying to construct the 'perfect' VBO.

There are limits as to what you can do, so strive for the biggest batches with the least amount of effort (I direct you to my ideas in my first post), while you might sent a little more than you need with each draw call you'll be doing less work CPU side.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement