• Advertisement

Archived

This topic is now archived and is closed to further replies.

Hypothetical VB batch size optimisation

This topic is 5268 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, Theoretically: If you have a map that requires the maximum allowable number of vertices, is it better to store all of those vertices in a single VB and then draw ranges of the VB according to view culling, or is it better to create multiple VB''s that break the terrain up into smaller chunks of say 30k verts? I figure that a single massive VB requires fewer setstreamsource commands, but it is a huge amount of data to shunt to the video card. On the other hand, using smaller chunks will mean that you''re not sending a massive VB to the card every frame, but you are using more streamsources. I''m guessing that the answer is linked to the hardware (amount of Vram, AGP size, etc.) but does anyone know of a general guideline for getting to the sweet-spot?

Share this post


Link to post
Share on other sites
Advertisement
quote:
Original post by SoaringTortoise
Theoretically: If you have a map that requires the maximum allowable number of vertices, is it better to store all of those vertices in a single VB and then draw ranges of the VB according to view culling, or is it better to create multiple VB''s that break the terrain up into smaller chunks of say 30k verts?

With hardware vertex processing, and using DIP, the hardware will only process those vertices you specify with the indices. So this gives you less SetStreamSource()''s, and better performance.

With software vertex processing, *all* the vertices between the lowest and highest indices used are processed. So if a big percentage of the map is not visible at one time, then you''ll be wasting a lot of time.

quote:
I figure that a single massive VB requires fewer setstreamsource commands, but it is a huge amount of data to shunt to the video card.

Well, if the vertex buffer''s already in video memory (static), I don''t think SetStreamSource() will slow down proportional to its size (if it''s affected by its size at all - since there''s mostly no copying here).

Share this post


Link to post
Share on other sites
OK, that makes sense. But if your VB at the maximum size, a lower-end vid card won''t be able to store the entire VB plus textures on the card, so there''ll still be swapping.

Share this post


Link to post
Share on other sites
FYI:

I just switched the program from using several VB''s to using a single VB and drawing subsets of it according to the culling. My original VB size was around 2,000 polys each. The big VB contained around 200,000 polys.

The difference in performance was a grand total of 3 frames per second better using the big VB.

Not too much of an improvement. Using the multiple smaller vb''s gives me the ability to do level loading on the fly so I think I''ll go back to it.

Share this post


Link to post
Share on other sites
quote:
Original post by SoaringTortoise
I figure that a single massive VB requires fewer setstreamsource commands, but it is a huge amount of data to shunt to the video card.


That's actually a good thing. First of all, the data probably already is on the video card. If you send a lot of data for processing at once, the GPU can process it all while the CPU is doing more important things (like not waiting for a SetStreamSource() to finish). You achieve a certain level of parallel processing.

While your gains were not very significant (3 FPS) on a slower system than yours, or just one with a different video card, the gain could be significantly more. Also, as you draw more and more primitives, the performance difference between the two methods will become a lot larger.

[edited by - glassJAw on August 19, 2003 5:36:23 PM]

Share this post


Link to post
Share on other sites
quote:
Original post by glassJAw
While your gains were not very significant (3 FPS) on a slower system than yours, or just one with a different video card, the gain could be significantly more. Also, as you draw more and more primitives, the performance difference between the two methods will become a lot larger.

Just stressing this:
Many people test things on apps that suffer from the "app too simple" syndrome. "Benchmarking" is an art, something not to be taken lightly .

quote:
Using the multiple smaller vb''s gives me the ability to do level loading on the fly so I think I''ll go back to it.

Sure, in such a case I''d go (already did go) for multiple VBs. Just make sure you choose an appropriate size:
- Too small: you''ll be loading and offloading things frequently.
- Too large: Loading''s slow.

Muhammad Haggag

Share this post


Link to post
Share on other sites
Don't go crazy and make a 40 megabyte vertex buffer. What happens to the poor person with an older (but still very adequate) 32 mb card? It's all stuck in system memory!

[edit] This is an extreme case but it gets the point across. DirectX will manage the swapping of things in and out of video memory for you, so you need to give it room to do that efficiently.

[edited by - Raloth on August 19, 2003 9:02:35 PM]

Share this post


Link to post
Share on other sites
quote:
Original post by Coder
Just stressing this:
Many people test things on apps that suffer from the "app too simple" syndrome. "Benchmarking" is an art, something not to be taken lightly .


That''s true. I guess this case a 200 000 vertex (I''m assuming by poly he meant vertex) buffer should probably be split up into a few smaller ones.

Though it must be taken into account that SetStreamSource() can be very expensive. And he DID get an FPS increase using a single, massive buffer.

Share this post


Link to post
Share on other sites
Coder: I''m pretty close to finishing the rendering part of the game, so I don''t think the app''s too simple. It''s got terrain, scenery meshes, multi-pass rendering with bump maps, envbump for the water, a fully functional particle system, a rudimentary AI, collision detection (treed down to the polygon, if necessary) and a home-grown scripting language. So, the 3fps is pretty indicative of the final game speed. The thing keeps on dipping below 40fps and I really need an extra 15 fps to implement a fractal-generated sky. That extra 3 (which is fairly tempermental anyway), isn''t nearly as big a boost as I was looking for.

(btw: Dungeon Seige runs at around 30 fps on my machine, so while 40fps may seem low, I am happy to be generating similar detail at similar frame rates. I suspect that the 10fps benefit is more down to the inventory+ui+other stuff, that DS does that I''m not doing yet, than any significantly better rendering code on my part)

Glassjaw: Yeah, I did mean polys not verts. Sorry.

Raloth: Sorry. I''m targetting at least 64mb cards. You can toggle off the detail textures, the [env]bump maps, the number of particles, and adjust the multi-pass levels to get it working, but unless you crank the view distance down to a nearly suicidal level it''s not going to run. I managed to get it all running on my old Voodoo3, so it''ll probably run on whatever you''ve got. I''m just saying that it''ll probably be more fun sticking toothpicks under your eyelids than playing this game with those settings.

Share this post


Link to post
Share on other sites

  • Advertisement