How many triangles per frame ?

Started by
12 comments, last by janta 14 years, 8 months ago
Quote:Original post by NiGoea
my example: a 20k triangles map has 20 materials, so I average 1k triangles for call. I have three passes that involves geometry (depth, normal and final pass), so I end up with 60k sent in 60 calls. DAMNED SLOW.

How slow? There's no way 60k triangles or 60 calls to DIP should ruin performance that much. In addition, too many DIP calls would put a burden on your CPU, so what CPU s it you have?
Last, how do you know the 60 DIP calls are indeed the bottleneck?

And since you have nVidia hardware, you should learn to use nvperfhud if you haven't already

Good luck
Advertisement
If you render everything with the same texture (say a pure white texture), does the FPS increase? Remove all the setTexture() calls in favor of one initial SetTexture() call. IF you get dramatic improvement, I would suggest looking at creating U,V atlases in a pre-processing step.

Another good question raised was what kind of occlusion culling are you using? If you render your scene in wireframe mode is the horizon line a smear of pure black? Can you see rooms through walls? I read you are using an octtree, but that isn't very good at culling out geometry when you have a large distance between z-near and z-far, it just gives you fast access to nodes in the frustum, but there are still a lot of occluded nodes in the frustum.

Those quake maps were meant for indoor rendering and were before the days of batched hardware accelerated rendering. What I mean to say by that is they inserted each and every triangle into a BSP and rendered only the visible triangles and always in front-to-back order from the camera. Likely they did some batching after they discovered the visible set, but also likely they merged all their textures onto as few pages of video memory as possible.

To get equivalent results you would have to either implement that type of per-triangle BSP algorithm in software (and likely skip directX and go right for the video buffer yourself) or figure out some form of portal culling system that you can implement (those usually require a little bit of input from the artists though, extra portal geometry and what not).

Other than more careful scene management, I can't think of any glaring error in what you are doing.
I will try to answer to all of you.


** My situation in a very simple form **

Light pre pass renderer made of three passes:
1- render GBuffer
2- render lights (light buffer)
3- render final scene

step 1 and 3 require to render the entire scene, by calling 'RenderScene'. Every call to 'RenderScene' implies:

- obtaining the visible octree nodes and doing other culling stuff (not important which)
- for each node render the triangles contained sorted by material. Every material implies a single 'DrawIndexedPrimitive' call.


** The problem **

Since I want to take advantage of the occlusion algorithm called Hierarchical ZBuffer Visibility, I have to use many "not-much-big" octree nodes.

But for simplicity, let's suppose I have only ONE big node that represents the entire map.

So I have two passes which are gonna render the octree one time each. Every time the octree is rendered, there will be as many 'DrawIndexedPrimitive' calls as the number of

different materials.

The problem is: having 20 materials implies having 20 drawing call for each pass => 40 'DrawIndexedPrimitive' calls.
20k triangles are visible on average, so every draw call will average 1k triangles.
=> 1k * 20 materials * 2 passes = 40k triangles in total (not so many)

RESULT: 15 fps on a 6800 Ultra.


** Without using any material **

The number of triangles is the same, but if I use only one material, I have one big call instead of 20 for each pass

=> 20k * 2 passes = 40k triangles as before... but with 100 fps.



** Conclusion **

It seems that doing many 'DrawIndexedPrimitive' slashes the performance.

I tried to do an single SetTexture call, but nothing change.

Moreover, I don't f****** know why, but even with a single big draw call of 20k triangles, things are slow... WHAT THE HELL ???


** Questions **

1- is it possible to use texture atlases ? I dont know... the texture combinations are very high. One time you need one, one time another, maybe a texture that resides in another atlas...
Wasn't it an old practice ?

2- What happens if you have to handle hundreds of different materials ?

3- How can one carry out a good occlusion culling scheme if it seems that it's better to send all the triangles in a single bunch ??

4- Does it have sense to make on the fly the vertex buffer and the index buffer for the visible geometry ??


I HATE 3D GRAPHICS WORLD :D!


-----

@ Krohm
I finished a complete software renderer engine last winter. I have been about to become crazy many times, because any library was used in this project of mine. So I'm not scared about hard things... I'm only scared about wasting time in making hard things that don't worth it
:°(

@ Steve_Segreto
No, any increase noted by using a single SetTexture.
Octree is perfect with occlusion culling if you use the Hierarchical buffer visibility (greene 1993). That's the one I used in the software engine. I'm discovering that maybe it's NOT a good approach with HW acceleration.



THANK YOU ALL
Quote:It seems that doing many 'DrawIndexedPrimitive' slashes the performance.

Yes but 60 is NOT many even for a pretty old (couple years) CPU
Quote:is it possible to use texture atlases?...

That's going to be a pain in the...
Quote:What happens if you have to handle hundreds of different materials ?

You do hundreds of DIP calls, which again is an acceptable number (a couple hundred, not tens of hundred)
Quote:How can one carry out a good occlusion culling scheme if it seems that it's better to send all the triangles in a single bunch ??

You don't have to (you can't) send all of them in a single bunch. See above.
Quote:Does it have sense to make on the fly the vertex buffer and the index buffer for the visible geometry ??

NO. Static geometry with a good balance of geometric size (for occlusion) vs. triangle count (for batching) is usually the most efficient approach

Bottom line: use the available tools (nvperf hud, profilers...) to figure out what is wrong with your application.

This topic is closed to new replies.

Advertisement