Small static buffers or big dynamic buffers ?

Started by
7 comments, last by paic 18 years, 4 months ago
Hi, I'm wondering : what is the best (performance wise) between - using small static buffers (1 vertex / index buffer for 1 texture set, around 100 tri per buffer) -> Lot of DIP call, but no locking / memcpying / unlocking. - using big dynamic buffers which are filled with the smaller buffers, and then rendered -> Far less DIP call, optimal buffer size, but for each frame : lock / memcpy (many) / unlock How do you people do ? (remember my question is from a performance point of view) Edit : More simply put, my question is "what's the optimal way of rendering state sorted geometry chunk" (knowing that it depends on the scene, the context, etc. but there's always some "don't do that" "do this" applicable in every cases ^^) [Edited by - paic on December 2, 2005 7:17:27 AM]
Advertisement
It could well have changed with later hardware revisions, but the wisdom was a 1mb VB for dynamic and 4mb VB for static.

Although, in all honesty, it's near enough impossible for anyone to come up with a good answer to this - finding the optimal balance tends to be empirical...

One thing I can spot... your "memcpy (many)": why not compose a larger buffer on the CPU-side and then do a single memcpy up to the GPU? Not entirely sure if it'll be any faster (depends what D3D does under the covers), but conceptually it sounds better [smile]

hth
Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

My current implementation is using big VB / IB with subsets. So, for a given mesh, I have 1 big VB/IB and many subsets with different texture sets. The problem is that I want to use a render queue and sort by shaders / textures sets / etc.
So, for that, I need to split my VB/IB by subset. And it will create many smaller VB/IB.

The basic approach is to not care and draw them. As said, it will draw a lot of small VB/IB which is not really optimal.

So I thought about creating big dynamic buffers. During a frame, I would do the following :

1) start of the frame : lock the dynamic buffers
2) for each data chunk, memcpy it to the right dynamic buffers (depending on materials, etc.)
2') if the dynamic buffers is filled, unlock, draw and re-lock. Or use another one, I don't know yet how I will handle that part.
3) end of the frame : unlock the dynamic buffers and draw them


But because of the overhead introduced by the memcpy, I'm not sure if I will gain speed over the basic approach. That's why I asked, because I'm sure someone around here already implemented such a system and has an idea on the performances of both methods ^^


Note : I need all this to be dynamic, because the small geometry chunks drawn on screen depend on the result of the culling algorithm.
Okay, I think I get you...

First off, with your current algorithm it's good practice to keep locks active for the minimum possible time. If you can design it, try and maintain the locks as being strictly either side of a memcpy() call.

Quote:Original post by paic
Note : I need all this to be dynamic, because the small geometry chunks drawn on screen depend on the result of the culling algorithm.

I forget where now, but this came up recently - I don't think this is how it *must* be done...

Sure, there are some similarities between vertex/triangle data and that fed into a culling algorithm, but they aren't necessarily identical. I think it would be a perfectly valid use of SYSRAM to keep a duplicate of the required parts of the geometry for culling.

That way the CPU can go about culling stuff however it sees fit, and then just use the results to index into the VB/IB on the GPU and despatch rendering. When you construct your VB/IB you should be able to define ranges that belong to particular models (same as ID3DXMesh's subsets), and these ranges can be quite easily fed into a DrawIndexedPrimitive() call.

For your culling data structure, just store the position (maybe a normal as well) and then group these by those subsets - such that when your culling algorithm accepts a particular group you can just pull out the start/finish/length properties and send it off to render. You never even need to touch the VB/IB contents.

hth
Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

Ok, let's see if I understand correctly :

1) Load time
1a) I load all my geometry chunks with their textures sets, materials, etc.
1b) I build big static VB / IB for each textures sets, etc. (so that I could draw those VB / IB with only 1 setTexture, 1 setMaterial, etc.)

2) Runtime

2a) Loop 1
2a1) The culling algorithm work on its own data, and tells me which chunk is visible or not
2b2) Depending on the previous result, I retrieve the ranges of the geometry chunks in the big static VB / IB

2b) Loop 2
2b1) Set each big static VB / IB
2b2) Issue DIP call with correct startIndex, etc. values (which were computed in the 2b2) step)


That's what you recommend ? This way, I have only 1 setStreamSource and 1 setIndices for each batch, but I have many DIP calls. That's better than many SSS and SI and DIP calls.

Do you think that the increased DIP calls count would still be faster than the memcpy of my method ?
Quote:Original post by paic
Ok, let's see if I understand correctly :

1) Load time
1a) I load all my geometry chunks with their textures sets, materials, etc.
1b) I build big static VB / IB for each textures sets, etc. (so that I could draw those VB / IB with only 1 setTexture, 1 setMaterial, etc.)

Yup, that sounds about right. If your vertex formats are the same you could go for one "super buffer" if you wanted.

Quote:Original post by paic
2) Runtime

2a) Loop 1
2a1) The culling algorithm work on its own data, and tells me which chunk is visible or not
2b2) Depending on the previous result, I retrieve the ranges of the geometry chunks in the big static VB / IB

Pretty much... although for 2b2 this should be "free" given a correct culling structure.

struct CullData{    // The following fields go directly to a DIP() call    DWORD startVertex;    DWORD startIndex;    DWORD vertexCount;    DWORD faceCount;    // The following data is for culling    D3DXVECTOR positions[]; // probably ::vertexCount in size};


That way, you might have something like:

for( every object we want to cull ){    bool result = Cull( object )    if( result == false )    {        // don't cull this object => render it        device->dip( object.startVertex, ... object.faceCount );    }}


Quote:Original post by paic
2b) Loop 2
2b1) Set each big static VB / IB
2b2) Issue DIP call with correct startIndex, etc. values (which were computed in the 2b2) step)

You could make it a multi-pass approach if you want, or (as my example above) you can combine them.

Quote:Original post by paic
Do you think that the increased DIP calls count would still be faster than the memcpy of my method ?

Yes, I think it will be faster than resource modification. One of the "golden rules" I work by is to try, wherever possible, to leave a resource static/unchanged. Changing resources is painful.

However, I can't say for definite that it will be faster - it all depends on the characteristics of a typical "session".

The key point I was trying to make is that this sharing/linking of graphics data might seem obvious but it's not required. Breaking that link can allow you to express a number of different algorithms (it's a good step towards multithreading for the new multi-core CPU's).

Based on my above example... if you had a million vertices, and 1000 objects then you'd be chewing up an amazing 3.8mb of system memory. I'd be very surprised if that extra storage makes much difference in a desktop scenario [smile]

hth
Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

Ok, thx a lot for the help.
I hoped I would have advices from more than 1 person, but your method seems fine, so I will try that ^^
Quote:Original post by paic
I hoped I would have advices from more than 1 person, but your method seems fine, so I will try that ^^

Hehe, maybe this thread will pick up some more replies over the next few hours/days - it's still a fairly new thread [smile]

ANY OTHER SUGGESTIONS?

You could always try this sort of thing over at GP&T, they're a good bunch to throw more abstract/general questions at...

hth
Jack

<hr align="left" width="25%" />
Jack Hoxley <small>[</small><small> Forum FAQ | Revised FAQ | MVP Profile | Developer Journal ]</small>

Yup, I post in the graphic programming and theroy forum quite often too (currently a thread on geometry clipmaps ^^) but this question is really bound to DirectX so I don't want to post it there ^^

This topic is closed to new replies.

Advertisement