Sign in to follow this  
mattm

Optimal Vertex buffer

Recommended Posts

I know similar questions have been asked before but i have searched, alot, and need some further advice. I am going to be rendering a terrain and need to know the most optimal way of buiding my vertex buffers. I have looked into and implemented three different ways: 1) Brute force - all of the terrain is passed into a static buffer during intialisation and then left there. A seperate buffer, containing height data, is created and then the two are melded within a vertex shader. This will involve zer lock/unlock per pass (except for the height buffer) and no set streams but alot of unneeded vertices will be passsed. 2) Sections - each section of my terrain is placed into a differnt static vertex buffer during intialisation and then not touched. The sections that need to be rendered (determined from a quadtree) are streamed one at a time to produce the output. This will involve zero lock/unlock per pass (except for the height buffer) but will involve many setstreams, although there will be no redundant data going to the card. 3) Dynamic sections - the vertices that are required (determined by a quadtree) are placed into a single buffer during runtime and then streamed to the card. My implementation of this involves a single lock/unlock and single stream per pass. Any help on which will be the most efficient, bearing in mind the game is an RTS (with moveable camera) and will have a detailed terrain with modifiable height, would be greatly apprecaited. In addition, is there a maximum number of vertex buffers you can have created at any one time? Thanks, Matt

Share this post


Link to post
Share on other sites
I'll do my best to cover your points here...

Quote:
I have looked into and implemented three different ways

Might sound silly, but if you've got working versions then surely you'll know which one is most optimal by some PIX benchmarking. PIX can tell you which is fastest, which requires the least state changes (often a big factor in performance) and whether any of them have differing memory usage patterns.

Quote:
2) Sections - each section of my terrain is placed into a differnt static vertex buffer during intialisation and then not touched. The sections that need to be rendered (determined from a quadtree) are streamed one at a time to produce the output. This will involve zero lock/unlock per pass (except for the height buffer) but will involve many setstreams, although there will be no redundant data going to the card.

There is an obvious improvement to this technique - one that I use whenever I've done terrain rendering.

Quadtree's are great - really great [smile]; but you can also get away with one huge static vertex buffer that is set at the start of terrain rendering, and then ignored. You can use Index Buffers to partition it up according to the quadree nodes (the memory consumption for many index buffers is modest) and then just set the respective index buffer and off you go. In my engine, the trade off of having an IB for each quadtree node stored (read: extra VRAM memory) paid off against the extra performance.

Quote:
3) Dynamic sections - the vertices that are required (determined by a quadtree) are placed into a single buffer during runtime and then streamed to the card. My implementation of this involves a single lock/unlock and single stream per pass.

I would only use this method if you could guarantee that your PVS remains a constant (or near constant) size. That way you (as you say) need only a single lock/unlock. If the PVS can increase in size you risk having to re-create your single vertex buffer to accomodate all the new geometry...

Quote:
a detailed terrain with modifiable height

Regarding the modifeable height:
- Are we talking all heights modified every frame? Some sort of geomorphing animation system?
- Are we talking about changing heights according to the physics simulation (e.g. explosion craters)?

It would probably be beneficial to know which as it will change the way you want to implement things. A lot of advantage can be gained if you only modify the height values occasionally, and a few tricks can be used if you know your vertex data's height will change every frame.

Quote:
is there a maximum number of vertex buffers you can have created at any one time?

No limit that should bother you! I'm pretty sure if you created 100's of 1000's of buffers (or millions of buffers) then you'd have problems - but if you create that many, you really want to be questioning why [smile]. I'd hazard a guess that you're likely to run out of VRAM storage space before you hit any ceiling limiting the number of buffers that exist.

hth
Jack

Share this post


Link to post
Share on other sites
Thanks for taking the time to answer.

Although i have implemented all 3 (to prove myself I could do it more than anything) i am using a very limited terrain (20x20 vertices) while I implement the "back end" and therefore testing saleability was proving difficult.

I had already had recommended to me the idea of static vertex buffer and then index buffer to partition it. I implemented this as well, but then got worried about all the extra vertices i would be passing each time so backed away from it.

Height will be changed regularly, but not every frame; the explosion analogy is probably the best - sometimes many changes over a number of consecutive frames, sometimes none at all.

Infact, since working on this some more, I believe that I can also use a single vertex buffer to render all vertices that share a common FVF. AS per the NVIDIA recommendation I am filling up the buffer with data and once the limit is reached flushing it and starting again. This has lead me to another question i am not sure of the answer; I am using indexed primitives because I know these are faster to draw. However, to use these with this system I am required to "offset" my indexes depending on how many vertices have already been drawn. I know it is hard to say but do you feel the advantage of using an index primitive outweighs the cost of updating the index number of nearly all my indices each frame?

Thanks,

Matt

Share this post


Link to post
Share on other sites
Quote:
Height will be changed regularly, but not every frame; the explosion analogy is probably the best - sometimes many changes over a number of consecutive frames, sometimes none at all.

You can probably make a lot of use out of this; as it's often useful to note that from our view of time - a change every second or every few seconds might well mean that animation occurs on ~50 frames out of a few hundred.

I always look at it more that writing the code for the common case (no changes) and then optimizing the exceptional case (changes) so that the exceptions don't "kill" execution completely...


Quote:
This has lead me to another question i am not sure of the answer; I am using indexed primitives because I know these are faster to draw. However, to use these with this system I am required to "offset" my indexes depending on how many vertices have already been drawn. I know it is hard to say but do you feel the advantage of using an index primitive outweighs the cost of updating the index number of nearly all my indices each frame?

In this, and similar cases, I see an obvious "pattern" that is sometimes hard to correct. That is, in trying to solve the problems in one area (constant writing to your VB) you can end up translating the problem to somewhere else (constant writing to the IB).

If this translation is to a better solution, excellent, but in the case of resource manipulation and D3D any locking of any resource is not ideal - sure, one might be faster than the other, but a generally accepted rule of thumb is that you avoid it where possible.

Conventional terrain engines have a few great properties - they are often of a fixed size (determined at load time), they are often of a regular 2D grid representation and they don't often need to change (even with LOD geomorphing) and they also fit perfectly into heirachical culling systems.

The following is an idea that I'd probably come up with - might not be great, but I think it solves a few problems of yours. The numbers are just ones that seemed convenient to me, so change them as you see fit [smile]

I would play into having a number of large VB "patches" - say a 4x4 grid (16 cells). Treat these as seperate entities (except the special case where you have to cross the borders) and you can then manipulate 1/16th of your terrain without harming the remaining 15/16th's (which, if needed, the GPU can access with no stalling).

You can also treat this 4x4 grid as your top-level quadtree nodes, and subdivide them accordingly using INDEX buffers - lots of them. Then, as you traverse this node of the quadtree, as soon as you get a completely accepted quad you throw it's index buffer in the render queue.

In my previous terrain engine, the cost of storing a lot of extra index buffers for a patch wasn't too bad, and the performance game made it worth while. For example:

1 - 8x8
4 - 4x4
16 - 2x2
64 - 1x1

that is, 5 levels, meaning that every actual vertex will have 5x2bytes of indicies stored for it. 8x8 patch = 81 vertices (~2.5kb) + 405 indicies (~0.8kb).

Okay, so thats your nice efficient rendering system (processor friendly, marginally memory unfriendly!). What you want is the deformable terrain.

In system memory you keep a copy of each of your 64 top level patches (the 8x8 grid I mentioned). This should be an exact copy of what you originally put in them.

When you get your deformation information (your "explosion") update the system memory array as this can be done incredibly quickly. At the end of a given frame you will know which of the 64 patches have modifications and thus need to be re-uploaded. Lock these vertex buffers and do a quick memory copy to refresh them.

The key here is that you can pretty much do the update in 3 lines of code:
1. Lock
2. Copy
3. Unlock

This will work very nicely if you create your static vertex buffer with write only flags.

There is a further optimization for this if you've got a reasonably high-performance engine (that is, frame rate isn't a huge problem). If you do some sort of interval between updates. That is, whatever the frame rate, fix the vertex buffer updates to 15hz. So.... if your game is clocking at 60hz, you only update the vertex buffer every 4th frame.

The above method worked really well for me in some other instances - if the per-frame change is relatively tiny (as it might be with a high frame rate and a simple interpolation animation) then the visual benefit of the animation isn't really worth the performance hit of locking/updating your buffer.


That all make sense? Sound like a reasonable idea to you? Hope so [smile]
Jack

Share this post


Link to post
Share on other sites
Thanks for that, it does make sense and i really appreciate you taking the time to help. However, how do you think that will compare to the idea below:

Each sector (as in your idea) has its own array of indices and vertices. These are created and stored at initialisation.

When the rendering comes the follwoing is performed

1) each sector is considered for visability and, if visible, added to a visible list.

2) The vertex buffer and index buffer is unlocked.

3) The list is looped over and The arrays from each sector are copied to the buffers.

4) At the end of the list the buffers are locked and the object drawn.

This will involve a single lock/unlock for each vertex and index buffer (assuming all data fits into a single buffer; if not two may be required) and a single setstream for rendering the object.

The disadvantage is more data is placed into a buffer each pass, BUT as these will be fast memcopies, this should be greatly mitigated.

Sorry for going over this again; i think after this round of thoughts i will move on to the next thing (a come back to it if i actually need to) but would be very interested to hear what you think.

Thanks,

Matt

Share this post


Link to post
Share on other sites
Quote:
Thanks for that, it does make sense and i really appreciate you taking the time to help.

Keep coming back with questions/thoughts - Terrain rendering is something I've done a fair bit of work with and find facinating. I like trying to solve these sorts of problems [smile].

Anyway, your idea - I can see that working very well. However, the 2 things that I'd be a bit wary of:

1. With a suitably freeform camera, the size of the PVS could change from just a few small sectors to having most of the map visible. Maintaining a single vertex/index buffer that you fill each frame (when you've calculated your PVS) could either be constricting (requiring a resize and/or multiple buffers) or wasteful (locking/writing a few 100 triangles to a buffer sized for several 1000's).

2. You're still going to copy/upload your array regardless of the fact that you've changed it or not. With my system, you could possibly only end up locking/uploading a small number of buffers - and only the ones that have changed. The question here is whether:
- multiple stream/state switches + couple of locks + small data transfer
or
- single state switch + single lock + large data transfer
On face value, yours sounds better - but I'd hedge my bets until some realworld data appeared [wink].


I'd be guessing that if you can find a suitable solution to #1 then the answer to #2 is a non-issue.

Cheers,
Jack

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by mattm
I know similar questions have been asked before but i have searched, alot, and need some further advice.

I am going to be rendering a terrain and need to know the most optimal way of buiding my vertex buffers. I have looked into and implemented three different ways:

1) Brute force - all of the terrain is passed into a static buffer during intialisation and then left there. A seperate buffer, containing height data, is created and then the two are melded within a vertex shader. This will involve zer lock/unlock per pass (except for the height buffer) and no set streams but alot of unneeded vertices will be passsed.


2) Sections - each section of my terrain is placed into a differnt static vertex buffer during intialisation and then not touched. The sections that need to be rendered (determined from a quadtree) are streamed one at a time to produce the output. This will involve zero lock/unlock per pass (except for the height buffer) but will involve many setstreams, although there will be no redundant data going to the card.

3) Dynamic sections - the vertices that are required (determined by a quadtree) are placed into a single buffer during runtime and then streamed to the card. My implementation of this involves a single lock/unlock and single stream per pass.

Any help on which will be the most efficient, bearing in mind the game is an RTS (with moveable camera) and will have a detailed terrain with modifiable height, would be greatly apprecaited. In addition, is there a maximum number of vertex buffers you can have created at any one time?

Thanks,

Matt





There are also level of detail LOD subsets that you can manipulate to cut down the triangle counts (things you can do with index buffers to select subsets off of one large vertex buffer).


Sometimes the sectional method can gain by drawing the nearest sections first which cuts down the pixel overdraw (if the pixel z buf is closer it doesnt have to put the rest of the current pixel thru the shader...)



When using regular sections of a heightmap it might be possible to do hidden terrain removal (sectional analysis that determines no part of a section is visible...)







Share this post


Link to post
Share on other sites
Quote:
1. With a suitably freeform camera, the size of the PVS could change from just a few small sectors to having most of the map visible. Maintaining a single vertex/index buffer that you fill each frame (when you've calculated your PVS) could either be constricting (requiring a resize and/or multiple buffers) or wasteful (locking/writing a few 100 triangles to a buffer sized for several 1000's).


I think this can be mitigated somewhat by making the buffer size the optimal size for the card (or around that value) thatway, if it does fill up at least i know it was a good size to send!

Quote:


You're still going to copy/upload your array regardless of the fact that you've changed it or not. With my system, you could possibly only end up locking/uploading a small number of buffers - and only the ones that have changed. The question here is whether:
- multiple stream/state switches + couple of locks + small data transfer
or
- single state switch + single lock + large data transfer
On face value, yours sounds better - but I'd hedge my bets until some realworld data appeared .



My thoughts exactly. To be honest, if i have a system that has a chance of scaling well i am happy and should probably underline this for now, other wise i am in danger of having a hugly optimised terrain with nothig to do on it!

Thanks again for your time, it has been a great help.

Matt

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this