Drawing polygon by polygon faster than grouping by material and then drawing?!

Started by
8 comments, last by Deliverance 17 years, 5 months ago
Hi there guys. I noticed an interesting(for me at least) aspect related to performance these days. I have an indoor/outdoor level(convex) that consists of exactly 6853 polygons(most of them are triangles but there are poly's there too). The level is textured with 20 textures, and the textures are distributed unevenly amongst the polygons(that means for example that Texture0 could be used by 75 polygons and Texture5 could be used by 3400 polygons, and so on). Also, the data for the polygons are memorized in big arrays: one big vertex array that contains all vertices, one big texture coords array that contains all the tex coords and so on for the normals and the colors. The face indices are also memorized in one big UINT array. Also there exists a polygon array which tells for every polygon the start indices in the vertex and face array, and number of faces and vertices (this way I can draw both with glDrawArrays and glDrawElements). I tried two approaches in drawing the level: Method1 -> For every polygon call -> glBindTexture(...) -> glDrawArrays(GL_TRIANGLE_FAN, ...) Method2 -> Loop through the whole polygon array and copy(append) the faces of this polygon in array RenderFaces[PolygonTexture](these arrays are preallocated so it just updates some facecount variable for every array and then performs a memcpy). At the end of this loop start another loop from 0 to (NumTextures-1)(20-1 in this case) and make one call for each array like this: -> glBindTexture(...) -> glDrawElements(GL_TRIANGLES, ...) Here is the performance I'm getting: Video: GeForce2 Proccesor: Intel Pentium 800MhZ Memory: 128MB SDR RAM FPS: Method1 ~ 18-19 FPS: Method2 ~ 18 Video: GeForce6600GT Proccesor: AMD Athlon 64 3500+(real 2.20GhZ); Memory 512 DDR RAM KingMax fsb433 (the other fsb's are over 500) FPS: Method1 ~ 640 FPS: Method2 ~ 610 Clearly the first method seems to be faster than the second. I do use GL_TRIANGLE_FAN on the first method but it's per-polygon. I just wonder why is the first method faster though? In the second method I call a glDrawElements for some large arrays though. So who's the funny guy that pushes the wrong buttons inside the hardware and makes me(feel dumb :) )and not understand what's happening? :)
Advertisement
My guess is that ~7000 polygons are not really much for any GPU, so the slowdown is in the CPU and the memcpy() call. Do you do that each frame, and if yes why? Isn't all the geometry static? Since the geometry data doesn't change, group by material just once *before* you enter the rendering loop.
"My guess is that ~7000 polygons are not really much for any GPU" -> They are for a GeForce2, consdering that these polygons occupy a large area of the screen kicking the fillrate down.
I do that every frame but not with all the polygons. Although I tested: baked all the level in these arrays (only once) and then draw, result: almost same FPS as when filling the arrays on the fly.
The level uses PVS sets for rendering. So i gather all the polygons form the visible leaves(that are in the view frustum too) and fill the arrays for faster rendering(or such I thought) by grouping the polygons by Material.
try benchmarking with a small window eg 320x240 or else u prolly wont see much diference between the two (as u seem to see at the moment, remember any difference of 1-2% can be accounted with noise)
My guess is poly sorting is slowing you down. The polys should be sorted during program load or even during model export from your modeling program.
Basically, what you're saying is you're getting identical performance regardless of the way you sort the polygons. This is very much evidence that your bottleneck is either fillrate or the CPU, not the geometry. The difference between 640fps and 610fps is 0.07 milliseconds per frame, which is beyond insignificant.

FYI: Geforce2 can reach something like 20Mtris/sec (ie. almost 3000fps with a 7ktri scene) so your scene isn't terribly taxing for the GPU in terms of geometry.
Thanks guys, fillrate explains it then.
But I'm intrigued Fingers_, how can a GF2 reach almost 3000FPS with a 7ktri scene?
My scene is almost 7000 poly's and it runs at about 20FPS on a GF2(I use vertex arrays). When you say it can reach 3000FPS is that in terms of geometry only, ignoring fillrate? I'm confused.
In almost all cases where a manufacturer details how many triangle per second their chip can draw, its a complete lie in actual practice and is nowhere near the truth.

Quote:Original post by Deliverance
Thanks guys, fillrate explains it then.
But I'm intrigued Fingers_, how can a GF2 reach almost 3000FPS with a 7ktri scene?
My scene is almost 7000 poly's and it runs at about 20FPS on a GF2(I use vertex arrays). When you say it can reach 3000FPS is that in terms of geometry only, ignoring fillrate? I'm confused.
You're rendering in a staggeringly slow and naive way, that's the problem. A properly written version would call DrawElements once per material, pulling from data written into a VBO at load time. Interleaving the arrays might be a performance win as well. Additionally, you wouldn't call BindTexture except when the texture changed, which would also be a minor boost. IIRC I was pushing 50K tris per frame on a GeForce 2 MX without any noticeable perf hit -- I was running at 60fps vsync all the time. (I never did a proper analysis of throughput, since that was a temporary system.) So 18 fps on a 7K scene is just embarrassing.
SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.
Quote:So 18 fps on a 7K scene is just embarrassing.


I'm embarassed then :) but still I want to understand where is the glitch. I made some modifications, here's how I'm rendering it now:
I call glDrawElements only once per material pulling the data out of VBO arrays, allocated at startup.
The face arrays are statically created(20 arrays for 20 materials), they are constructed at startup.

I have a performance win:
~40 FPS on GF2
~730 FPS on GF6600GT

Still under 60FPS with a 7000polys on a GF2, BUT why? I cannot understand. (I do verify if the GF2 supports the VBO extension - ARB one, not NVidia specific). Where could the problem be? Why am I getting so low performance?

This topic is closed to new replies.

Advertisement