VAR + SwapBuffers performance problems

Started by
12 comments, last by HellRaiZer 20 years, 6 months ago
I just finished testing the sphere benchmark you suggested. Results from some tests are:

- Lighting is disabled in all tests.- Rendering 3000 Balls with 950 triangles each, grouped in one triangle list for every ball.************************************************************************	withOUT GL_NV_vertex_array_range (Simple VA) (800x600x32)**********************************************************************================================================- Minimum Geometry : 2,905,622.25 tris/sec- Maximum Geometry : 3,595,494.50 tris/sec- Average Geometry : 3,509,296.25 tris/sec================================================************************************************************************	with GL_NV_vertex_array_range (2 textures) (800x600x32)**********************************************************************================================================- Minimum Geometry : 7,029,515.00 tris/sec- Maximum Geometry : 9,769,094.00 tris/sec- Average Geometry : 9,260,820.00 tris/sec================================================************************************************************************	with GL_NV_vertex_array_range (1 texture) (800x600x32)**********************************************************************================================================- Minimum Geometry : 8,782,627.00 tris/sec- Maximum Geometry : 9,797,635.00 tris/sec- Average Geometry : 9,347,072.00 tris/sec================================================************************************************************************	with GL_NV_vertex_array_range (No textures) (800x600x32)**********************************************************************================================================- Minimum Geometry : 9,405,980.00 tris/sec- Maximum Geometry : 9,830,263.00 tris/sec- Average Geometry : 9,382,089.00 tris/sec================================================************************************************************************	with GL_NV_vertex_array_range (No textures) (8x8x32) (all enabled)**********************************************************************================================================- Minimum Geometry : 9,450,366.00 tris/sec- Maximum Geometry : 9,836,973.00 tris/sec- Average Geometry : 9,388,133.00 tris/sec================================================************************************************************************	with GL_NV_vertex_array_range (No textures) (8x8x32) (nothing enabled)**********************************************************************================================================- Minimum Geometry : 9,559,716.00 tris/sec- Maximum Geometry : 10,462,466.00 tris/sec- Average Geometry : 9,577,451.00 tris/sec================================================


As you can see i hardly break the 10 Mtris/sec barrier when rendering on a 8x8 window with depth-stencil-color-texture disabled I don''t say that the numbers aren''t good enough. nVidia''s demo on VAR (the wavy thing) gave the similar results.

The 3000-sphere package, has been rendered by randomly translating the origin for every sphere, every frame, so every ball lies inside an imaginary box. Ball''s dimensions are small, because i thought this way i can minimize fillrate. I placed the camera so all the balls being visible every frame (imaginary box completely inside the frustum). Backface culling is enabled, and no hint for volume clipping.

Can you explain the above results? I''m a little confused with them, because there seems to be no big difference between multitextured and no-textured tests (except from the min counter).

quote:
You might want to consider a different spatial subdivision structure than an octree, in order to avoid the redundancy problem and minimize the required splits.


The reason i started (and stuck) with octrees is the simplicity when creating them. No tree is that hard to implement, but octrees are the simplest. I wanted octrees for another reason to. As i had read in some occlusion culling papers, their perfect cube node shape, is more friendly to those algorithms, than an arbitary BBox. While i was implementing HOMs, i haven''t saw any advantage of using perfect cubes vs arbitary boxes, but i was too focus on occlusion culling, i haven''t got any time for changing them. Now, that my HOM implementation is stuck on the software rasterizer (really hard part, i must admit, not only from the speed point of view but too hard to get OpenGL-like precise results), i''m too bored to change them. But occlusion culling is another topic, which respects plenty of threads and posts!

[off topic]
(Sad memories came to his mind. "What the hell", he thinks, "i''ll complete it someday.")

- Reminder (to myself) : Change Octrees.
- Question (to myself) : With what??????
[/off topic]

quote:
Ah, but you didn''t mention you have a GF-1... But you didn''t really mention how much performance you actually get either. First, get rid of those double triangles you mentioned above, they will suck away your precious fillrate like mad. I''m starting to suspect that you aren''t really geometry limited, but fillrate limited. VAR won''t help you very much in that case.

Please give the definition of performance! Do you mean tris/sec, pixels/sec (how can you measure that? is it the obvious way, the way to go?)? I thought FPS is enough as a performance result.

My opinion is that i''m both geometry and fillrate limited. But lets assume that i''m only fillrate limited, what can i do to overpass it? Change resolution, smaller textures, texture compression, texture filtering, minimize blended polygons, no multitexturing, are some possible solutions, i think. But what if i can''t "implement" one of them, because i really want the functionality it gives, then i guess the only solution is a newer card

Thanks for the support Yann. I''m really greatfull

I''ll now try to eliminate double rendered triangles, and i''ll be back with the final results

HellRaiZer

PS:
quote:
Well, that is generally what is called profiling: measuring the exact impact of a specific piece of code, without external interference from other code parts. Things like parallel execution pipelines can hide the actual performance hit of a function, because it is delayed/overlapped. What you are talking about is benchmarking, which is basically a speed measure over the entire program. Profiling is micro-benchmarking, on subsystem or even instruction level.

My bad english made me think of benchmarking and profiling as the same thing. Thanks for the clear explanation
HellRaiZer
Advertisement
quote:Original post by HellRaiZer
As you can see i hardly break the 10 Mtris/sec barrier when rendering on a 8x8 window with depth-stencil-color-texture disabled

Well, that''s the maximum a GeForce1 can do. Actually, 10Mtris/sec is a very good number for a GF1.

quote:
Can you explain the above results? I''m a little confused with them, because there seems to be no big difference between multitextured and no-textured tests (except from the min counter).

Texturing will make very little difference, until you saturate the fragment pipeline, ie. you are fillrate limited. On your sphere test, you obviously aren''t. Try to make your spheres much bigger, but without changing the face count. From a certain size on, you will hit the bandwidth limit of the fragment pipeline or texture memory. Then you''ll see an extreme difference between both figures.

quote:
- Reminder (to myself) : Change Octrees.
- Question (to myself) : With what??????

I would (of course) suggest ABTs, but I''m probably biased on that point

quote:
Please give the definition of performance! Do you mean tris/sec, pixels/sec (how can you measure that? is it the obvious way, the way to go?)? I thought FPS is enough as a performance result.

FPS is an absolutely arbitrary performance unit, only valid on one single 3D scene, with a specific camera path, a specific shader setup, etc. It''s unusable for general performance comparison. You need to provide at least tris/sec, and the median value of the triangle area. Better is to provide two numbers: one with minimized fillrate impact (ie. very small render window, no textures), and one with the fillrate impact.

quote:
My opinion is that i''m both geometry and fillrate limited.

As you could see in your tests, the geometry limit is 10 Mtris/sec. You haven''t reached that in your engine. Since you are not performing any special vertex processing (hardware lights, or texgen, for example), pretty much everything else comes from the fragment pipeline, texture memory and framebuffer accesses.

quote:
But lets assume that i''m only fillrate limited, what can i do to overpass it? Change resolution, smaller textures, texture compression, texture filtering, minimize blended polygons, no multitexturing, are some possible solutions, i think.

Correct.

quote:
But what if i can''t "implement" one of them, because i really want the functionality it gives, then i guess the only solution is a newer card

Also correct.
Maybe you could overclock it to get better performance, but i dont think there will be much improvement.

quote:
i have sorted my triangle lists in a cache-friendly way


what do you mean by that, and how did you do it?

[ My Site ]
''I wish life was not so short,'' he thought. ''Languages take such a time, and so do all the things one wants to know about.'' - J.R.R Tolkien
/*ilici*/
With GPU-friendly triangle lists, i mean all the triangles are in a specific order. This is you render adjacent triangles one by another, so this way the cache has already 2 (at least) indices in it. You can''t achieve perfect continuity, but i do my best.

In few words:
Sort triangles so adjacent triangles are rendered next to each other.

NVTriStrip is a tool that can do that things for you. Keep in mind your cards cache size (which NVTriStrip i think does it for you), and try to group triangles.

I prefer to do it with my own code, because i don''t want to mess up with nVidia''s stuff I know this may not be as efficient (more cache misses will occur), but it is mine. Also i have added this procedure into a plugin i wrote for Lightwave so i can export to my own format. It''s easier this way.

I have to go now. CU.

And Yann, thanks once again. I think this is over.

HellRaiZer
HellRaiZer

This topic is closed to new replies.

Advertisement