fastest non-extension way to render

Started by
13 comments, last by Metus 19 years, 11 months ago
''ello mi friends. implementing VBO''s in my projected increased my average framerate from 1200fps to 2000fps with an generic game mesh (around 5000 polys). In the standard codepath i''m using a simple glGenLists and glCallLists to render my model which results in the 1200fps mentioned above. Is there any faster way to render static meshes? I''ve been poking around the glDrawArray and the glDrawElements with no success and should i still mess around with these or just leave it as it is?
Ethereal
Advertisement
In OpenGL 1.1 (I guess that''s what you mean by ''unextended''), the fastest method far static geometry are display lists. Standard vertex arrays (without VBO, VAR or VAO) will generally be slower. So keep your engine as it is.
Yann: That''s surprising. I know in theory that should be the case, because the geometry could be stored in hardware memory, but it isn''t what I''ve found in practice. I know it varies a lot between implementations though.

____________________________________________________________www.elf-stone.com | Automated GL Extension Loading: GLee 5.00 for Win32 and Linux

BB: I was assuming he meant performance on older hardware, since he talked about unextended OpenGL, and since it seemed to be a fallback path from VBOs. Any modern card supports VBOs, so the display list path would only be used on old hardware anyway. And generally speaking, display lists were (a lot) faster on old hardware than plain VAs.

But yes, it depends a lot on the implementation, and on the type of data you compile into the list. Display lists have been a little neglected in the most recent driver revisions, a lot of people see them as an obsolete feature, now that we have VBO. That''s why they might very well be slower on recent chipsets.

Also, driver manufacturers have done quite a lot of improvements and tricks to get standard VAs faster. I suspect that many of them have seriously relaxed the strict client/server model (which was a major bottleneck due to sync issues), at least on consumer level hardware.

So theorectically display lists are faster, since on unextended OpenGL, they are the only legal way to store geometry in VRAM - at least according to the specs. In practice, it will vary, depending on how the driver is implemented, on how strict it follows the specs, and on how recent the chipset/driver is. Best bet is to profile, of course, but this might lead to unexpected results on different platforms.

If starting a new engine from scratch, I would personally recommend against using display lists for anything geometry related. But since Metus already has DLs running, as well as a VBO codepath, it doesn''t really matter that much. He could have 3 codepaths (DL, standard VA, and VBO) and do some quick runtime profiling at startup to determine the best path on the hardware the system is running on. That would be the safest way.
yes, with "unextended" i mean version 1.1, but i have to say i got some very weird results with VBO's and display lists; sometimes the VBO version is faster and sometimes the lists are faster... i'd guess that the more polys, the more advantage i'll get from VBO's, but with around 3k polys, the lists are about 200fps faster (i know that fps are the wrong way of measuring things, but i'll get a tough overview)

edit: my engine will blow out 60k polygons on my laptop with an Intel I830 chipset.. and that's with simple lists..
what's the difference between regular lists and compiled vertex arrays?


[edited by - metus on May 30, 2004 4:20:42 PM]
Ethereal
quote:Original post by Metus
yes, with "unextended" i mean version 1.1, but i have to say i got some very weird results with VBO's and display lists; sometimes the VBO version is faster and sometimes the lists are faster...

Hmm. I have to admit that since VAR was released, I haven't really used display lists anymore (except for state change compiling). Still, it's weird that DLs are faster than VBOs, although everything is possible, of course. It entirely depends on how the drivers optimize both. On what hardware and drivers are you running your tests ?

quote:Original post by Metus
i'd guess that the more polys, the more advantage i'll get from VBO's, but with around 3k polys, the lists are about 200fps faster (i know that fps are the wrong way of measuring things, but i'll get a tough overview)

VBOs have a break even point, and need a minimum of faces to be efficient. Although 3k faces in a single VBO is definitely far above that threshold, so I would expect pretty good performance.

Hmm, you aren't manipulating your arrays in any way at runtime, are you ? Or mapping them ? Also, I assume that you're using STATIC_DRAW_ARB as usage hint ?

quote:Original post by Metus what's the difference between regular lists and compiled vertex arrays?

They have nothing in common. Compiled vertex arrays are an (old) extension to standard vertex arrays. The card was supposed to keep the transformed vertices in memory, so that subsequent drawing of the exact same geometry in the same frame would bypass the transformation pipeline. It's only effective if you do multiple passes over the same geometry. This extension was initially added at Carmack's request, but vendors weren't too hot about it. It was never really finalized, stayed very experimental. Today, I would consider it as totally obsolete. It's a very problematic extension to implement on modern pipelines, and is often just implemented as a hack. I wouldn't touch it with a ten foot pole.


[edited by - Yann L on May 30, 2004 4:48:45 PM]
On my own computer (AMD barton 2600+ and ATI 9700 pro using catalyst 4.5) the VBO''s are waay faster on a 67k poly mesh, but on my friends nVidia Gf4400 with the latest Detonators, i''d get 560fps using lists and 460 using VBOs.

My Object3DS has a list of meshes, each containing it''s own indexbuffer, and when i''m loading the file, i compiling one texcoord buffer and one vertexbuffer, and looping through the mesh list and set the approperiate index buffer using glBindBufferARB and renders the amount of polygons with glDrawRangeElements(..., numVerts, numIndices)
However i''m using GL_UNSIGNED_INT and the times i''ve changed it to GL_UNSIGNED_SHORT no speed increases were gaines.

Ethereal
quote:Original post by Metus
On my own computer (AMD barton 2600+ and ATI 9700 pro using catalyst 4.5) the VBO''s are waay faster on a 67k poly mesh, but on my friends nVidia Gf4400 with the latest Detonators, i''d get 560fps using lists and 460 using VBOs.

NVidia took a long time to get VBOs working as they should, and they still aren''t 100% right. You can sometimes get very weird and unexpected results, that''s why our engine uses VAR instead of VBO by default on NV hardware. Make sure that your friend installed the most recent drivers from nvidia. Old drivers will almost certainly mess up VBOs in various ways.

quote:Original post by Metus
My Object3DS has a list of meshes, each containing it''s own indexbuffer, and when i''m loading the file, i compiling one texcoord buffer and one vertexbuffer, and looping through the mesh list and set the approperiate index buffer using glBindBufferARB and renders the amount of polygons with glDrawRangeElements(..., numVerts, numIndices)

Wait a second there, I didn''t completely get you mean. What kind of buffers exactly are you generating at loadtime, and how are you rendering them every frame ?

quote:Original post by Metus
However i''m using GL_UNSIGNED_INT and the times i''ve changed it to GL_UNSIGNED_SHORT no speed increases were gaines.

Won''t make a speed difference, unless you''re VRAM bandwidth limited. Although ushorts take half the memory of uints, one should keep that in mind.
alright, i''ll try to go over it again:
each Object3DS has a list of meshes or whatever you can call it. Each of these meshes has an individual vertexlist and texcoordlist that''ll be filled when the model is loading.

Before the engine enters the loop, i''ll copy all of these meshes vertices and texcoords into two separate buffers so that i''ll only have to bind two buffers per model instead of two buffers per mesh.

Them, while looping through the meshes, i''ll only have to bind the indexbuffer associated to the right mesh...
Ethereal
In my past experiments, i have always gotten same performance from DLs and VBOs on nVidia cards with Detonator(45 something) drivers and above.

This topic is closed to new replies.

Advertisement