VAR + SwapBuffers performance problems

Jim Drygiannakis · 2003-10-09T13:01:47

Hello, Can somebody explain this weird behavior? Let me explain. I'm working on a demo level with about 30k triangles, sudivided using Octrees. Since yesterday i was using simple Vertex Arrays for rendering. Today i set up NV_VAR. I have minimized state changes (binding textures, changing blending) and i have sorted my triangle lists in a cache-friendly way (at least i did the best i could on that!). Yesterday, i wrote a simple in-code (not external, like VTune) profiler (based on Enguinity series), and i tried to measure the time spent in various parts of my code. I have tested the demo with some different code-formats and i found out some things. First of all, the most time (about 80 - 90%) was taken by Octree::Render() + SwapBuffers(). Everything else (Text output, demo updating, and Octree::Update()) was extremely fast. I think you knew this, without telling you so!! Here are the setups i use for testing. (in order of execution) Case 1: Individual triangles, manual backface culling, simple VAs, with compiled vertex arrays Case 2: Individual triangles, manual backface culling, simple VAs, without CVA. Case 3: Bunch of triangles in one call, OGL backface culling, simple VAs, with CVA. Case 4: Bunch of triangles in one call, OGL backface culling, simple VAs, no CVA. Case 5: Individual triangles, manual backface culling, NV_VAR, no CVA. Case 6: Bunch of triangles, OGL backface culling, NV_VAR, no CVA. In all cases (except case 6) the minimum FPS i got was 38 (min FPS in case 3) and the maximum i got was 300 (max FPS in case 5). Everything looked normal. The FPS was smoothly going up and down and the averange FPS was about 98 - 102. No significant performance changes between cases 1-5. The weird stuff now. The last setup (case 6). First of all, Octree::Render() time droped from about 9 msec to 2 msec, but the total render time was increased. To figure out whats happening, i placed a timer, to time SwapBuffers() calls. Guess what. SwapBuffers took about 17 msec (averange) to complete. Despite the fact that the whole render time had increased (from 131 secs to 255 secs for rendering a demo with 11000 frames), and despite the fact that the minimum FPS was 18, the averange had been increased to 110 from 102! My first thought was, "VAR had finally worked!". It did what it's suppose to do. But that's not what i'm looking for. To be more precise, i have to tell you that in case 6, the fps counter was jumping from 30 to 700 all the time! And the whole system is completely "unstable". Despite my dissappointment, i tried to streess the system a little, to see if it fails (FPS drops). I placed a Sleep(15) before SwapBuffers, and i got the same results. Now SwapBuffers() took 2 msec to complete (17 - 15). I thought, "I can place more CPU work before swapping, in case to fill the gap!". But this isn't the case, is it? I want to ask if there is something i can do to make this work in a more stable way. Triple buffering may be a option, but i don't know if it possible with OpenGL. Do you have any suggestion for the above "weird" behavior? I don't think it is really weird. VAR suppose to do this kind of things. But how can i make it more stable? Any feedback appreciated. Thanks in advance. HellRaiZer [edited by - HellRaiZer on October 7, 2003 12:48:29 PM]

Graphics and GPU Programming Programming

Started by HellRaiZer October 07, 2003 11:41 AM

12 comments, last by HellRaiZer 20 years, 6 months ago

HellRaiZer

1,001

Author

October 09, 2003 11:08 AM

I just finished testing the sphere benchmark you suggested. Results from some tests are:

- Lighting is disabled in all tests.- Rendering 3000 Balls with 950 triangles each, grouped in one triangle list for every ball.************************************************************************	withOUT GL_NV_vertex_array_range (Simple VA) (800x600x32)**********************************************************************================================================- Minimum Geometry : 2,905,622.25 tris/sec- Maximum Geometry : 3,595,494.50 tris/sec- Average Geometry : 3,509,296.25 tris/sec================================================************************************************************************	with GL_NV_vertex_array_range (2 textures) (800x600x32)**********************************************************************================================================- Minimum Geometry : 7,029,515.00 tris/sec- Maximum Geometry : 9,769,094.00 tris/sec- Average Geometry : 9,260,820.00 tris/sec================================================************************************************************************	with GL_NV_vertex_array_range (1 texture) (800x600x32)**********************************************************************================================================- Minimum Geometry : 8,782,627.00 tris/sec- Maximum Geometry : 9,797,635.00 tris/sec- Average Geometry : 9,347,072.00 tris/sec================================================************************************************************************	with GL_NV_vertex_array_range (No textures) (800x600x32)**********************************************************************================================================- Minimum Geometry : 9,405,980.00 tris/sec- Maximum Geometry : 9,830,263.00 tris/sec- Average Geometry : 9,382,089.00 tris/sec================================================************************************************************************	with GL_NV_vertex_array_range (No textures) (8x8x32) (all enabled)**********************************************************************================================================- Minimum Geometry : 9,450,366.00 tris/sec- Maximum Geometry : 9,836,973.00 tris/sec- Average Geometry : 9,388,133.00 tris/sec================================================************************************************************************	with GL_NV_vertex_array_range (No textures) (8x8x32) (nothing enabled)**********************************************************************================================================- Minimum Geometry : 9,559,716.00 tris/sec- Maximum Geometry : 10,462,466.00 tris/sec- Average Geometry : 9,577,451.00 tris/sec================================================

As you can see i hardly break the 10 Mtris/sec barrier when rendering on a 8x8 window with depth-stencil-color-texture disabled

I don''t say that the numbers aren''t good enough. nVidia''s demo on VAR (the wavy thing) gave the similar results.

The 3000-sphere package, has been rendered by randomly translating the origin for every sphere, every frame, so every ball lies inside an imaginary box. Ball''s dimensions are small, because i thought this way i can minimize fillrate. I placed the camera so all the balls being visible every frame (imaginary box completely inside the frustum). Backface culling is enabled, and no hint for volume clipping.

Can you explain the above results? I''m a little confused with them, because there seems to be no big difference between multitextured and no-textured tests (except from the min counter).

quote:
You might want to consider a different spatial subdivision structure than an octree, in order to avoid the redundancy problem and minimize the required splits.

The reason i started (and stuck) with octrees is the simplicity when creating them. No tree is that hard to implement, but octrees are the simplest. I wanted octrees for another reason to. As i had read in some occlusion culling papers, their perfect cube node shape, is more friendly to those algorithms, than an arbitary BBox. While i was implementing HOMs, i haven''t saw any advantage of using perfect cubes vs arbitary boxes, but i was too focus on occlusion culling, i haven''t got any time for changing them. Now, that my HOM implementation is stuck on the software rasterizer (really hard part, i must admit, not only from the speed point of view but too hard to get OpenGL-like precise results), i''m too bored to change them. But occlusion culling is another topic, which respects plenty of threads and posts!

[off topic]
(Sad memories came to his mind. "What the hell", he thinks, "i''ll complete it someday.")

- Reminder (to myself) : Change Octrees.
- Question (to myself) : With what??????
[/off topic]

quote:
Ah, but you didn''t mention you have a GF-1... But you didn''t really mention how much performance you actually get either. First, get rid of those double triangles you mentioned above, they will suck away your precious fillrate like mad. I''m starting to suspect that you aren''t really geometry limited, but fillrate limited. VAR won''t help you very much in that case.

Please give the definition of performance! Do you mean tris/sec, pixels/sec (how can you measure that? is it the obvious way, the way to go?)? I thought FPS is enough as a performance result.

My opinion is that i''m both geometry and fillrate limited. But lets assume that i''m only fillrate limited, what can i do to overpass it? Change resolution, smaller textures, texture compression, texture filtering, minimize blended polygons, no multitexturing, are some possible solutions, i think. But what if i can''t "implement" one of them, because i really want the functionality it gives, then i guess the only solution is a newer card

Thanks for the support Yann. I''m really greatfull

I''ll now try to eliminate double rendered triangles, and i''ll be back with the final results

HellRaiZer

PS:

quote:
Well, that is generally what is called profiling: measuring the exact impact of a specific piece of code, without external interference from other code parts. Things like parallel execution pipelines can hide the actual performance hit of a function, because it is delayed/overlapped. What you are talking about is benchmarking, which is basically a speed measure over the entire program. Profiling is micro-benchmarking, on subsystem or even instruction level.

My bad english made me think of benchmarking and profiling as the same thing. Thanks for the clear explanation

HellRaiZer

Yann L

1,806

October 09, 2003 11:34 AM

quote:Original post by HellRaiZer
As you can see i hardly break the 10 Mtris/sec barrier when rendering on a 8x8 window with depth-stencil-color-texture disabled

Well, that''s the maximum a GeForce1 can do. Actually, 10Mtris/sec is a very good number for a GF1.

quote:
Can you explain the above results? I''m a little confused with them, because there seems to be no big difference between multitextured and no-textured tests (except from the min counter).

Texturing will make very little difference, until you saturate the fragment pipeline, ie. you are fillrate limited. On your sphere test, you obviously aren''t. Try to make your spheres much bigger, but without changing the face count. From a certain size on, you will hit the bandwidth limit of the fragment pipeline or texture memory. Then you''ll see an extreme difference between both figures.

quote:
- Reminder (to myself) : Change Octrees.
- Question (to myself) : With what??????

I would (of course) suggest ABTs, but I''m probably biased on that point

quote:
Please give the definition of performance! Do you mean tris/sec, pixels/sec (how can you measure that? is it the obvious way, the way to go?)? I thought FPS is enough as a performance result.

FPS is an absolutely arbitrary performance unit, only valid on one single 3D scene, with a specific camera path, a specific shader setup, etc. It''s unusable for general performance comparison. You need to provide at least tris/sec, and the median value of the triangle area. Better is to provide two numbers: one with minimized fillrate impact (ie. very small render window, no textures), and one with the fillrate impact.

quote:
My opinion is that i''m both geometry and fillrate limited.

As you could see in your tests, the geometry limit is 10 Mtris/sec. You haven''t reached that in your engine. Since you are not performing any special vertex processing (hardware lights, or texgen, for example), pretty much everything else comes from the fragment pipeline, texture memory and framebuffer accesses.

quote:
But lets assume that i''m only fillrate limited, what can i do to overpass it? Change resolution, smaller textures, texture compression, texture filtering, minimize blended polygons, no multitexturing, are some possible solutions, i think.

Correct.

quote:
But what if i can''t "implement" one of them, because i really want the functionality it gives, then i guess the only solution is a newer card

Also correct.

Ilici

862

October 09, 2003 12:07 PM

Maybe you could overclock it to get better performance, but i dont think there will be much improvement.

quote:
i have sorted my triangle lists in a cache-friendly way

what do you mean by that, and how did you do it?

[ My Site ]
''I wish life was not so short,'' he thought. ''Languages take such a time, and so do all the things one wants to know about.'' - J.R.R Tolkien
/*ilici*/

HellRaiZer

1,001

Author

October 09, 2003 01:01 PM

With GPU-friendly triangle lists, i mean all the triangles are in a specific order. This is you render adjacent triangles one by another, so this way the cache has already 2 (at least) indices in it. You can''t achieve perfect continuity, but i do my best.

In few words:
Sort triangles so adjacent triangles are rendered next to each other.

NVTriStrip is a tool that can do that things for you. Keep in mind your cards cache size (which NVTriStrip i think does it for you), and try to group triangles.

I prefer to do it with my own code, because i don''t want to mess up with nVidia''s stuff

I know this may not be as efficient (more cache misses will occur), but it is mine. Also i have added this procedure into a plugin i wrote for Lightwave so i can export to my own format. It''s easier this way.

I have to go now. CU.

And Yann, thanks once again. I think this is over.

HellRaiZer

HellRaiZer

VAR + SwapBuffers performance problems

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

VAR + SwapBuffers performance problems

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines