Jump to content
  • Advertisement
Sign in to follow this  
NiGoea

How many triangles per frame ?

This topic is 3233 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi all, I was thinking about how many triangles should be sent for each frame, and how many texture changes we should perform per frame. I mean, the limit. I'm pretty disappointed about my deferred engine, which can now load .MAP files (quake1-2-3), because switching textures and doing multiple 'DrawIndexedPrimitive' calls turned out to be too slow even for a medium/small map of 20k triangles. I used the E1M1 Quake1 map. Particularly, it turns out that doing twenty 'DrawIndexedPrimitive' is way slower than doing a single, but much bigger, one. But if one wants to use materials, many 'DrawIndexedPrimitive' has to be done. So how the hell does one resolve this problem ? How many triangles do you send on average per frame ? How many calls do you have ? --- my example: a 20k triangles map has 20 materials, so I average 1k triangles for call. I have three passes that involves geometry (depth, normal and final pass), so I end up with 60k sent in 60 calls. DAMNED SLOW. I don't even venture to guess what happens if I would have shadow maps right now... since you render again the scene for each light. --- I use 'SetTexture' every time the diffuse texture changes. Am I wrong ? Do I have to put many different textures in a big unique one ? THANKS TO ALL

Share this post


Link to post
Share on other sites
Advertisement
What hardware are you running this on?

Also, why are you doing 3 passes for your g-buffer, why not bind 3 render targets at once and write out all three in one pass.

Share this post


Link to post
Share on other sites
Quote:
Original post by adt7
What hardware are you running this on?

Also, why are you doing 3 passes for your g-buffer, why not bind 3 render targets at once and write out all three in one pass.


It's a light pre pass renderer, so at least you have three passes. One to make the G-Buffer, one to compute light values (but this doesn't involve geometry at all!) and the last to render geometry taking light values from the Light Buffer.

In my case there is an extra step: I first make the depth buffer, and only then I make the normal buffer (which contains other data also), because it seems to me that in this way I can take advantage of z-buffer by discarding computations for invisible pixels.

---

Anyway, I have a geforce 6800 Ultra. It's not new, but I can't accept that it cannot render quickly a 10 years old FPS map.

Share this post


Link to post
Share on other sites
Quote:
Original post by PolyVox
Basially your findings are correct - the number of DrawIndexedPrimitive calls is far more important than the number of triangles. Have a read of the following NVidia presentation:

http://developer.nvidia.com/docs/IO/8230/BatchBatchBatch.pdf


The article was extremely interesting, but what does it teach ?
As far as I'm concerned, it suggests to decrease the number of calls... but how can one do it if he is using multiple materials. The only way is to pack many texture on the same surface and to update texture coordinates... doesn't seem so easy.

Plus, if I wanna use a screen-space occlusion culling system, I cant send all the triangles contained in the frustum at one time, it would nullify the occlusion system... rather, I should send bunches of triangles => SLOW

I mean... WHAT THE HELL do you do ?!

Thanks !! :-D

Share this post


Link to post
Share on other sites
Quote:
Original post by NiGoea
In my case there is an extra step: I first make the depth buffer, and only then I make the normal buffer (which contains other data also), because it seems to me that in this way I can take advantage of z-buffer by discarding computations for invisible pixels.


Doing a depth-only pass only helps if you're using a heavy pixel shader, and your g-buffer pass for a light-prepass renderer should be very light (you're not doing any actually shading, after all). You're probably better off just doing depth+normals in one pass.

Share this post


Link to post
Share on other sites
Quote:
Original post by MJP
Quote:
Original post by NiGoea
In my case there is an extra step: I first make the depth buffer, and only then I make the normal buffer (which contains other data also), because it seems to me that in this way I can take advantage of z-buffer by discarding computations for invisible pixels.


Doing a depth-only pass only helps if you're using a heavy pixel shader, and your g-buffer pass for a light-prepass renderer should be very light (you're not doing any actually shading, after all). You're probably better off just doing depth+normals in one pass.


Well, you're right. Actually, my normal pass involves two samples, one cross product and one normalize per pixel...

Share this post


Link to post
Share on other sites
You might save on some texture calls by sorting each model by the material they use. That way you can render all objects with that texture without having to change materials in between.

Are you using any culling methods?

Share this post


Link to post
Share on other sites
Quote:
Original post by NiGoea
The article was extremely interesting, but what does it teach ?
As far as I'm concerned, it suggests to decrease the number of calls... but how can one do it if he is using multiple materials. The only way is to pack many texture on the same surface and to update texture coordinates... doesn't seem so easy.

...

WHAT THE HELL do you do ?!
You are right. It is not easy.
Depending on the shader complexity, there are various possibilities. The texture atlas approach you're describing is effective but quite involved to get right as some tcCoord remapping is involved, and tcCoords, in today's shader-driven-world, may be accessed in arbitrary ways.
A somewhat more robust way is to use spare sampler registers (don't tell me you're already using all 16) and discard one's contribution depending on a vertex attrib value. Whatever this is to be done thuru branching or math zeroing-out is nontrivial (also recall dynamically indexing samplers is not allowed on D3D9 HW). It is essentially an "ubershader" approach.
I am very unlucky since I don't like ubershaders at all... and I ended up writing a shader disassembler which walks in the compiled code and modifies everything. I don't count anymore the number of times I've shot myself in the foot with this beast, not to mention that I need D3DX to make it work, which I find rather ugly.
I urge you to strongly resist trying shader re-mangling, unless you don't care for your mental health, which I clearly didn't have since the start!

If you can live with ubershaders, just modify the source assets to include the 'switching' per-vertex attrib and you'll be right at home with none of the above mentioned issues. Much better.

Quote:
Original post by NiGoea
Plus, if I wanna use a screen-space occlusion culling system, I cant send all the triangles contained in the frustum at one time, it would nullify the occlusion system... rather, I should send bunches of triangles => SLOW
If you're sorting front-to-back for more z-reject, no, it is not. Sending large batches will outweight by far the deficit of a worse zbuff rejection ratio, culling can still be performed on a per-batch basis. Yes, it will trash more fillrate, but I've had a rather good experience with it so far.

Anyway, 60 calls shouldn't be a problem: I think the rendertarget switch is really killing your GPU. Also, mixing a lay-z-only with deferred shading makes little sense to me as you're essentially pretending that the per-pixel attrib to be costly to compute... which actually is, if you're doing parallax occlusion mapping or complex shading. It is my personal opinion that in those cases, the benefits for deferred shading are nullified... is the problem looping on itself?

Share this post


Link to post
Share on other sites
Quote:
Original post by Darg
You might save on some texture calls by sorting each model by the material they use. That way you can render all objects with that texture without having to change materials in between.


I'm already doing this. But in a indoor map it's normal to have 10-20-30 different materials.

Quote:

Are you using any culling methods?


Yes. An octree. But using DX is a pain in the ass anyway, because the more nodes the octree have, the more calls you have to do => SLOW.
On the opposite, if you use an octree with few nodes, it doesn't have much sense.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!