Large vertex buffers [d3d]

Started by
5 comments, last by Peachy keen 18 years ago
[just learning d3d] The documentation mentions changing the vertex buffer used as being an expensive operation, if all my verteces use the same fvf, is it more efficient to make one massive vertex buffer of size x such that x > sum of verteces needed by all objects [ could be a lot larger, maybe 30K verteces or so per buffer], or make a seperate vertex buffer for each handful of objects, so that the size of the buffer fits the needed vertex count exactly. using just one huge buffer would need a lot more book keeping for making new objects, but is it worth it? especially after several dynamicly created objects [like bits of terrain] are destroyed and recreated, with possibly different vertex counts, the segments of free verteces may become quite fragmented, and possibly lead to a huge amount of wasted space. just to recap, questions are: what is more efficient, a few huge banks of verteces, where large numbers of objects share the space, and each newly created object just cuts off more of the bank [where available] or makes another huge bank, or small banks shared by only a few objects [or maybe even one object per bank for objects that will get created/destroyed often, like terrain]? In the case of using large banks of verteces, is there a reasonable way to deal with fragmentation, that doesn't involved per-frame additional slowdown [not any switching around vertex buffers, maybe switching vertex indices if..] Does switching vertex indices effect performance negatively like how switching vertex buffers does?
Advertisement
Good question - the exact specifics of it depend on the individual hardware. ATI says the optimal size for a static VB is 4mb, while the optimal size for a dynamic VB is 512kb - 1024kb. So yea, you are going to have to split them up at some point - generally, I make mine bigger than that, since those stats came out a while ago.

Note that the vertices don't even have to be in the same format to be in the same VB. If you just think of the buffer as a chunk of memory, you can do whatever you want with it. If you are doing instancing in software, you may wish to make multiple copies of the same data, wheras if you aren't, you can pack in different geometry into the same buffer.
Dustin Franklin ( circlesoft :: KBase :: Mystic GD :: ApolloNL )
Recently I`ve been experiencing the same issue when optimizing my screensaver application to run also on very old non-shader HW. I was surprised to see that GF6800 received the same increase of performance as GF2MX did when I put all my vertices into single VB. Then I was just switching IBs during rendering. This was the first method that brought massive performance increase.
The second method with single IB for all objects did bring some performance increase but it was very small compared to first method. But it did raise the performance about 8 % since I was no longer switching IBs.

Thus, definitely try to avoid switching VBs. Switching IBs from the same VB is much less expensive.
However, your situation with lots of dynamic VBs is different, so you might not spot the same performance increase as with static VBs.

VladR My 3rd person action RPG on GreenLight: http://steamcommunity.com/sharedfiles/filedetails/?id=92951596

Yes, you are correct. I use large per-type vertex buffers exclusively.

In the case of a static vb in D3DPOOL_DEFAULT, you must vertex_buf->Lock() the vertex buffer in order to change its contents. This is an expensive call to make as I understand. I believe index buffers work this way as well, so changing the indices would be equally as expensive.

One other (guaranteed to be slow) method would be to draw the vb from a pointer to CPU RAM. d3d_device->DrawPrimitiveUP() is the method.

Then you can make your updates into CPU RAM, and draw from it as well. It may be equivalent or better in speed compared to making the changes in CPU RAM, transferring that to GPU RAM, then drawing it from GPU RAM.

If you are worried about fragmentation, you could rely on the STL std::vector for a memory-managed container. Use reinterpret_cast<VOID *>(&vector_name[0]) to obtain the direct pointer to CPU RAM in a format that DrawPrimitive*() will accept.

I use vector always. Never a VOID* / new[], because I *HATE* nothing more than debugging memory leaks and whatnot. I'm sure you would agree. :)

An empty deconstructor is always preferred. It means that the containers are taking care of themselves, as good little OO objects should. :)

[Edited by - taby on April 24, 2006 10:29:21 AM]
DrawPrimitiveUP is implemented internally as a dynamic VB lock, fill, and unlock anyway. It'll be slower than doing it yourself though, since the GPU needs the vertices immediately. It has to wait for the data to arrive on the GPU before it can draw. Doing it yourself, you can leave a gap between Unlock() and SetStreamSource(), which gives the driver more time to get the VB into video memory.
Quote:Original post by Evil Steve
DrawPrimitiveUP is implemented internally as a dynamic VB lock, fill, and unlock anyway. It'll be slower than doing it yourself though, since the GPU needs the vertices immediately. It has to wait for the data to arrive on the GPU before it can draw. Doing it yourself, you can leave a gap between Unlock() and SetStreamSource(), which gives the driver more time to get the VB into video memory.


Normally a driver places a dynamic vertex buffer in the AGP range of the system memory this makes a copy to the video ram unnecessary. It’s more like that you have to pay for the dynamic vertex buffer anyway but with Draw*UP there is even more overhead for the transfer from the UP to the dynamic buffer.
Oh if it was done in cpu ram it would be no problem whatsoever [downright easy matter of a fact], but i don't want that slowdown that is involved in the drawPrimitiveUP call. Was mostly talking about, in the instance of the vertex buffers, if i make object A, then B , then C, that they would sit in the Vbuffer memory like ABC, but if i delete B and try to add D with size B+n, it obviously can't fit between a and c, thus it would end up in the memory as A CD. If i can fragment it, then i can squeeze whatever is the size of B, into B's spot, and just put the rest after C, and clean it all up with indeces [since i don't want to change the indices after they are created, so i won't want to move C down into b's spot like a vector would, that seems like it would be an awful lot of wasted effort put accross the line to the gpu]. All this method does is shift the problem of fragmentation from VBuffers to the index buffers, which as i understand are less of a performance hit to change [thus the lost space that fragmentation results in won't be as much a hit as it would be in the case of Vbuffers, since changing the index buffer doesn't hit as hard].

Just trying to get my brain around all of this before i sit down and actualy tap it all out, only to find that my vertex/index manager is a piece of poo. The memory management systems will be a breeze to write, thats not the problem at all, am just trying to figure out exactly where the performance can be saved.

A vector is nice and all for some things, but it's tendency to rearrange elements in this case sounds like a killer if it's doing it in gpu memory.

[the only real strangely sized bits are going to be the map sections, which are going to be derived from a tile/height map. While every section has the same number of tiles, ones with tiles that varry widely in height also have cliffs seperating them, which s where the extra vertices are picked up that make the sizing unpredictable.]

oh, and thanks for all the input.

This topic is closed to new replies.

Advertisement