Archived

This topic is now archived and is closed to further replies.

(Advanced) How many different FvF's in a 3d engine?

This topic is 5585 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello, according to my information switching vertex buffers is expensive - especially when different formats get involved. I am in the middle of starting (nice, eh) a 3d engine and am thinking now on how to handle this. My options are: (a) have a small number of defined FvF''s that get used by the engine, and actually organise rendering by FvF - I indend to use dynami buffers extremely extensive (to prevent stalls), btw. (b) have a number of differen but non overlapping (!) FvF''s that get mixed into the final format using different Streams. I would order the rendering by stream, too (like (a)), but have one FvF for 3d coordinates, one for textures etc. What is the optimal way t o handle this? For sure not every "part renderer" is needing all potential FvF information - so how do you guys handle this for optimum performance? Regards Thomas Tomiczek THONA Consulting Ltd. (Microsoft MVP C#/.NET)

Share this post


Link to post
Share on other sites
Your basically saying the same thing in both options.

Either way you come down to a set number of FVFs for your engine. Just determine then ahead of time and you''ll be fine. The only place I might put flexibility into it would be in the number of textures. Otherwise you pretty much know what type of data your going to be rendering. Determine it ahead of time, tell your artists to constrain to it, and your done.

Headaches are what make programming hard. ;-)

Share this post


Link to post
Share on other sites
I realised that, too - it basically is a switch.

Now, how do you handle the problem with textures? Ok, all tris willl have at least one texture (or the engine is basically gourard shaded). But how to do the rest?

What is the best aproaach?


Regards

Thomas Tomiczek
THONA Consulting Ltd.
(Microsoft MVP C#/.NET)

Share this post


Link to post
Share on other sites
I actually just create one VB per FVF, including one for each number of textures. I usually only have four or five different types of VBs total. The less the better, so try to make decisions to effect that.

I have an idea now that I think about it. I''m not sure if you want to try this right now but it would bee interesting to see what happens.

Setup one VB for dual textures and VB or three textures. If you only need to render one texture, just set the second and/or third textures to null.

Lets say most of your objects are dual texture with just a few single texture objects. Make just dual texture VBs and have all objects use dual texture vertices, just zero out the second set of UVs if the object only has one texture. When you go to render, your render primitive can set the second texture channel to NULL so it''s not used. The renderer will think it''s rendering dual texture objects but the second texture is null and not rendered.

This will probably blow something up, although it would be interesting to try.

Share this post


Link to post
Share on other sites
Well, I posted this on a not public newsgroup and Philip Taylor from Microsoft actually gave me a very interesting advice:
quote:

the issue with texture coordinates, thats the entire reason
streams were introduced in DX 8. with streams, you can have
position and color in one stream, texcoord1 in a 2nd stream,
and texcoord2 in a 3rd. you only need map the 2nd and 3rd
stream when you are doing texturing, or multi-texturing.
this avoids keeping extra data in the vertex struct, or
duplicating data. we have been giving this advise since
pre-DX 8 days in 2000. nothing new here, (cut for NDA)



This basically means: do NOT use different FvF''s, but use "complimentary" FvF''s. In your case:

One for xyz, color, one for maybe texture 1, one for texture 2, one then one for texture 3 - that way you have one structure in your code and can basically ad/remove streams. Keeps the engine design coherent.


In your suggestion this would be FvF 1: xyz, FvF 2: Texture 1,2, FvF 3: Texture 3,4.

I will continue thinking alonw this way :-)


Regards

Thomas Tomiczek
THONA Consulting Ltd.
(Microsoft MVP C#/.NET)

Share this post


Link to post
Share on other sites
thona gave some excellent advice.

I can't recommend using FVFs with unused parts, as you're just wasting bandwidth sending the stuff to the card. This is an extremely important issue with dynamic buffers especially. Try to use static vertex buffers as much as you possibly can, and keep the total amount of data to a minimum.

Edit:
Another thing. Often, you don't even need to modify all of the vertex data, just texture coordinates, for example. This is another point where streams are priceless.

- JQ
Full Speed Games. Coming soon.

[edited by - JonnyQuest on August 29, 2002 8:29:35 AM]

Share this post


Link to post
Share on other sites
Static buffers?

How?

I mean, buffer switches are expensive, and you can hardly have a ton of small vertex buffers for the objects, or?


Regards

Thomas Tomiczek
THONA Consulting Ltd.
(Microsoft MVP C#/.NET)

Share this post


Link to post
Share on other sites
Just put them all in the same one. I doubt that you''re using fundamentally different vertex formats for each object. You have, what, position, normal, texture 0 to texture n or instead of the normal, diffuse and/or specular component - and you won''t usually be varying those between models a lot. If you have more than one or two textures (remember, only GeForce 3+, Radeon 8500+, and matrox parhelia actually support more than 3 textures, and Radeon+ supports 3, all others, including the wildly popular GeForce2 support only two textures anyway) the additional textures are usually dynamically applied anyway, which makes them ideal for an additional dynamic stream or so.

- JQ
Full Speed Games. Coming soon.

Share this post


Link to post
Share on other sites
Ok, putting them all in the same one. Now,

(a) how do I handle world transformations for them? Lets say I have terrain, and an object is a tree. I have 150 of these trees (in a small area, so no culling), but they are turned and on different positions. Do I make the transformations manually to put them into one static VB?
(b) How doI handle this if I have to cull objects? Becaue the player is in a forest of 4000 trees - and I surely dont want to render them all. Just create hundred patches?

I heard as advice to use dynamic VB''s. Now you advice to use static ones.


Regards

Thomas Tomiczek
THONA Consulting Ltd.
(Microsoft MVP C#/.NET)

Share this post


Link to post
Share on other sites
quote:
Original post by thona
Ok, putting them all in the same one. Now,

(a) how do I handle world transformations for them? Lets say I have terrain, and an object is a tree. I have 150 of these trees (in a small area, so no culling), but they are turned and on different positions. Do I make the transformations manually to put them into one static VB?


Er, no. Just take that tree and render it 150 times using different world matrices.

quote:
(b) How doI handle this if I have to cull objects? Becaue the player is in a forest of 4000 trees - and I surely dont want to render them all. Just create hundred patches?


Culling has nothing to do with vertex buffers. That should happen in your application''s pipeline (scene graph).
One way to do this is with trees (octrees, quadtrees) - but there are others as well. See 100s of articles on this site about culling.

quote:

I heard as advice to use dynamic VB''s.


You''re obviously using the wrong sources. Both the Radeon SDK and the nVidia SDK plus the respective documents say, in just about every second sentence, that static vertex buffers are much, MUCH faster. (for obvious reasons - you don''t send megabytes of data to the graphics card 1000s of times per second)

- JQ
Full Speed Games. Coming soon.

Share this post


Link to post
Share on other sites
Well, are they really fast enough to have lets say 500 vertex buffer changes on every frame?

Thats the point - I was thinking of using dynamic vertex buffers to reduce DrawPrimitive calls.


Regards

Thomas Tomiczek
THONA Consulting Ltd.
(Microsoft MVP C#/.NET)

Share this post


Link to post
Share on other sites
quote:
Original post by thona
Well, are they really fast enough to have lets say 500 vertex buffer changes on every frame?


Why the hell would you need 500 vertex buffers? That's just plain crazy, I'm sorry.

EDIT: To give an example - the engine I was working on until a few weeks ago had 6 vertex buffers if I remember correctly. The geometry consisted of about 1 Million vertices. I had only 3 different FVF formats, which isn't very much of course, but even if you had more, I don't think you could even come up with 250 possibilities.
quote:

Thats the point - I was thinking of using dynamic vertex buffers to reduce DrawPrimitive calls.


Why would they reduce the number of DrawPrimitive calls?

Either you're misunderstanding something here or I've misread all your posts so far... Could you clarify what you're trying to do please?

- JQ
Full Speed Games. Coming soon.

[edited by - JonnyQuest on August 29, 2002 12:59:44 PM]

Share this post


Link to post
Share on other sites
I completely agree with what your saying. I just had the thought while I was writing things. I know it''s a waste of bandwidth, it just sounded interesting.

If your rendering 500 instances of a single tree, you call DrawPrimitive 500 times but only load the VB once. You don''t have to change the VB since the ONE tree is the same VB you rendered last time.

The difference between dynamic and static VBs is where the data "lives" in the system. Static VBs are guaranteed to be in video card memory. Dynamic VBs could be either in video or AGP memory, you don''t have the choice. If they happen to be in AGP memory, the data must be transfered to the card first before it can be rendered. This is the reason Static VBs will most of the time be faster, but not always.

Share this post


Link to post
Share on other sites
Perhaps Thomas is thinking about the same thing that I’ve also been wondering about lately. Say you have a game with a large area like a RPG in which you need to load in terrain and objects as you get closer to them. You also need to unload them as you get farther away. Your aim is to have one VB so you don’t have to switch VBs when rendering. But then that VB has to be dynamic because you need to add and remove objects as the player moves around.

On the other hand, if you say that dynamic VBs are too slow and you need to use static VBs, then you need to divide up your game space into smaller “cells”. All the objects and terrain of each cell go into a smaller static VB which never changes. But you are going to need more static VBs. So in a world where things are loaded and unloaded, there seems to be a tradeoff between having one (or a very few) dynamic VB and a higher number static VBs.

I’m not sure what the number of static VBs should be. I’m still trying to figure out how all this should work. But I would think the minimum would be nine, since you’d have the cell that the player is in plus the surrounding eight cells (3x3 cell area). Perhaps more flexible would be to have a 5x5 area. But then you have 25 VB switches during the render and I don’t yet know what an optimal number would be.

There is another problem with having one large dynamic VB buffer in a game where you load and unload things as the players moves. You have to somehow manage the memory of the VB. I don’t know any good way to do that yet.

Share this post


Link to post
Share on other sites
quote:
Original post by thona
This basically means: do NOT use different FvF''s, but use "complimentary" FvF''s. In your case:

One for xyz, color, one for maybe texture 1, one for texture 2, one then one for texture 3 - that way you have one structure in your code and can basically ad/remove streams. Keeps the engine design coherent.



I’m not sure I understand how this works. When you give D3D a set of indices just before DrawPrimitive, those indices refer to all streams. Therefore, the offset into the first VB for x, y, and z coordinates must correspond to the offset into the second VB for texture1 coordinates. So for vertices that don’t have a texture1 coordinate, you still have to leave space in the second VB for them anyway, right? Then you don’t save any space. Why not just go ahead and use one VB with the biggest FVF you’ll need?


Share this post


Link to post
Share on other sites
As far as how many VBs to have, you should really only have just a few. 1 or 2 static for things that get rendered all the time; weapon models, trees, non-moving parts of vehicles, etc. It all depends on the type of game your making.

Fill the dynamic VBs with data that changes often; characters, terrain, etc. Make the VBs as large as you can. NVidia suggests 128K or bigger. Since AGP transfers in 64K chunks, make sure it''s a ultiple of 64k or you''ll waste AGP bandwidth.

Dynamic VB memory management is a big issue. There are three ways to do it, ordered from easiest to hardest to implement:
1) Reload the entire buffer each time with the changed data.
2) Append the new data to the end, creating a circular buffer.
3) Garbage collecction and repacking of the VB.

I''m acutally implmenting #3 right now. It''s not been easy but I''m getting a handle on it. The hard part is keeping it fast. I find I''m doing a lot of the tasks over time to keep performance high.

Remember, dynamic VBs are not necessarilly slow. They become slow if video card memory fills up and the system needs to dump the VB data back to AGP memory. Or if your changing the data all the time, but this should be an obvious performance hit. Otherwise they are compariable to static VBs in speed.

You should only have to make 3 to 5 different types of VBs. I''ll list the mosts common types below. You tell me if there is partiular data that doesn''t fit.

1) XYZ, Normal, Tex1, Tex2
2) XYZ, Normal, Diffuse, Tex1, Tex2
3) Transformed, Diffuse, Tex1

That''s all I use in my engine at the moment. I can''t think if I might need more at the moment, I just ate.

Share this post


Link to post
Share on other sites
I was going to suggest a billboarding and a single texture FvF but those cases are covered by the the 3 you posted smanches.
If you were going to have _alot of single texture mesh, you might want a dedicated FvF without the unused Tex2 coor.

If you were going to use blending/bones, wouldn''t you need a format with the XYZB1-B5? And there''s that point-sprite format as well.

Share this post


Link to post
Share on other sites
Moe:
Have few big static vertex buffers, put everything you can in there and then do everything you can''t put in there in dynamic vertex buffers.

- JQ
Full Speed Games. Coming soon.

Share this post


Link to post
Share on other sites
Sorry if I am interfering in your discussion, but I think you guys can tell me the answer I need.

I''m writing an aquarium simulation as my first major Direct3D8 application. When I started to test it on other people''s computers I noticed that it has some severe bottleneck. When I run it on my Duron@900Mhz Matrox G450 system I get about 40-50 fps and the numbers are just about the same on my friends Thunderbird@1.2GHz Geforce2 GTS system. He can even run it 1600x1200 with 4xAA and score the same results.

I use D3DXMESHes and their DrawSubset calls to render all my objects. As I read through this thread it came to my mind that those meshes probably all have their own vertex and index buffers. So there must be atleast 40 stream switches for each frame.

How expensive those stream switches really are? Could those cause the bottleneck?

Thank you.

Share this post


Link to post
Share on other sites
Your right Magmai, I forgot about particles. This is by far not a complete list and won't fit every engine. It's just the most common types used.

You would only have the B1 - Bn if you were blending on the card. I do all animations on the processor and don't need them. You have to make sure your not using the GPU too much or the CPU will stall.

The best solution is to fit as much as your data as you can into static VBs. The rest goes into dynamic. Limit the number of different FVFs as much as possible, that will help with VB switches.

There is no ideal solution for every engine. It all depends on what type and amount of data your using.

[edited by - smanches on August 30, 2002 2:00:00 PM]

Share this post


Link to post
Share on other sites
And really, if you manage it right, it doesn''t even matter if you have a several dozen FvF''s. So long as it''s not several dozen stream changes per sprite

I thought the only reason to use dynamic VB was for meshes that you caluclated - like waves, cLoD terrain, or CPU blending like smanches mentioned. Even character animations can use static buffers, you just need to set the correct local transform prior to rendering each joint-limb-joint segment.

Are there other reasons to choose dynamic VBs?

Share this post


Link to post
Share on other sites
Sorry for not being invoeled in this thread for a day - spent a day on the GamesConvention, and exhibition in germany, yesterday - pretty brutal, if you ask me. My feet hut terribly.

Now, back to some explanations on my problem. Yes, I know that complex geometry can be stored in static buffers :-)

My main problem was basically along the line of JimH - what if your ggeometry is dynamic? Like a terrain renderer? You definitly do not want to have a static buffer for every little patch - can turn pretty nasty pretty fast, with hundreds of VB switches if you use static vertex buffers.

With a space renderer, or with complex geometry units (objects), using a static vertex buffer for them and then basically playing around with the world matrices is very fine. But some things would just go out of scope - then pushing them in a dynamic vertex buffer was, to my knowledge, the optimal solution.

Anyhow, this is giving me headaches :-)


Regards

Thomas Tomiczek
THONA Consulting Ltd.
(Microsoft MVP C#/.NET)

Share this post


Link to post
Share on other sites
Like I said earlier, there is nothing wrong with dynamic VBs as long as you watch video memory. Dynamic VBs will be in video memory as long as there''s room, and they act just as fast as static VBs there too. If you have more data than can fit in video memory, there is nothing you can do about having to fill a dynamic VB every frame.

When having to fill VBs every frame, make sure you use a frame count reference to determine what data to throw out instead of a FIFO algorithm. That is, throw out the data that is rendered the least amout of times per frame as well as frames per second. Make sure you use the VB swapping that D3D has as well. I forget the actual symantics of it right now.

Share this post


Link to post
Share on other sites
quote:
Original post by smanches
Dynamic VB memory management is a big issue. There are three ways to do it, ordered from easiest to hardest to implement:
1) Reload the entire buffer each time with the changed data.
2) Append the new data to the end, creating a circular buffer.
3) Garbage collecction and repacking of the VB.


Another way that just came to mind is to treat the VB like a disk drive, with memory blocks all the same size. Assuming you are using indexed rendering, the vertices don’t have to all be contiguous in the VB. So if an object can’t fit into one block, you allocate another block somewhere else in the VB and then continue. Naturally, you’ll have some wasted space due to internal fragmentation, but that may be better than having to garbage collect. I’ll have to think about this a little more. One disadvantage is that you will probably need to do a lock for each block.

[edited by - JimH on September 1, 2002 2:41:19 PM]

Share this post


Link to post
Share on other sites