Archived

This topic is now archived and is closed to further replies.

fastest non-extension way to render

This topic is 4939 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

''ello mi friends. implementing VBO''s in my projected increased my average framerate from 1200fps to 2000fps with an generic game mesh (around 5000 polys). In the standard codepath i''m using a simple glGenLists and glCallLists to render my model which results in the 1200fps mentioned above. Is there any faster way to render static meshes? I''ve been poking around the glDrawArray and the glDrawElements with no success and should i still mess around with these or just leave it as it is?

Share this post


Link to post
Share on other sites
In OpenGL 1.1 (I guess that''s what you mean by ''unextended''), the fastest method far static geometry are display lists. Standard vertex arrays (without VBO, VAR or VAO) will generally be slower. So keep your engine as it is.

Share this post


Link to post
Share on other sites
BB: I was assuming he meant performance on older hardware, since he talked about unextended OpenGL, and since it seemed to be a fallback path from VBOs. Any modern card supports VBOs, so the display list path would only be used on old hardware anyway. And generally speaking, display lists were (a lot) faster on old hardware than plain VAs.

But yes, it depends a lot on the implementation, and on the type of data you compile into the list. Display lists have been a little neglected in the most recent driver revisions, a lot of people see them as an obsolete feature, now that we have VBO. That''s why they might very well be slower on recent chipsets.

Also, driver manufacturers have done quite a lot of improvements and tricks to get standard VAs faster. I suspect that many of them have seriously relaxed the strict client/server model (which was a major bottleneck due to sync issues), at least on consumer level hardware.

So theorectically display lists are faster, since on unextended OpenGL, they are the only legal way to store geometry in VRAM - at least according to the specs. In practice, it will vary, depending on how the driver is implemented, on how strict it follows the specs, and on how recent the chipset/driver is. Best bet is to profile, of course, but this might lead to unexpected results on different platforms.

If starting a new engine from scratch, I would personally recommend against using display lists for anything geometry related. But since Metus already has DLs running, as well as a VBO codepath, it doesn''t really matter that much. He could have 3 codepaths (DL, standard VA, and VBO) and do some quick runtime profiling at startup to determine the best path on the hardware the system is running on. That would be the safest way.

Share this post


Link to post
Share on other sites
yes, with "unextended" i mean version 1.1, but i have to say i got some very weird results with VBO's and display lists; sometimes the VBO version is faster and sometimes the lists are faster... i'd guess that the more polys, the more advantage i'll get from VBO's, but with around 3k polys, the lists are about 200fps faster (i know that fps are the wrong way of measuring things, but i'll get a tough overview)

edit: my engine will blow out 60k polygons on my laptop with an Intel I830 chipset.. and that's with simple lists..
what's the difference between regular lists and compiled vertex arrays?


[edited by - metus on May 30, 2004 4:20:42 PM]

Share this post


Link to post
Share on other sites
quote:
Original post by Metus
yes, with "unextended" i mean version 1.1, but i have to say i got some very weird results with VBO's and display lists; sometimes the VBO version is faster and sometimes the lists are faster...


Hmm. I have to admit that since VAR was released, I haven't really used display lists anymore (except for state change compiling). Still, it's weird that DLs are faster than VBOs, although everything is possible, of course. It entirely depends on how the drivers optimize both. On what hardware and drivers are you running your tests ?

quote:
Original post by Metus
i'd guess that the more polys, the more advantage i'll get from VBO's, but with around 3k polys, the lists are about 200fps faster (i know that fps are the wrong way of measuring things, but i'll get a tough overview)


VBOs have a break even point, and need a minimum of faces to be efficient. Although 3k faces in a single VBO is definitely far above that threshold, so I would expect pretty good performance.

Hmm, you aren't manipulating your arrays in any way at runtime, are you ? Or mapping them ? Also, I assume that you're using STATIC_DRAW_ARB as usage hint ?

quote:
Original post by Metus what's the difference between regular lists and compiled vertex arrays?


They have nothing in common. Compiled vertex arrays are an (old) extension to standard vertex arrays. The card was supposed to keep the transformed vertices in memory, so that subsequent drawing of the exact same geometry in the same frame would bypass the transformation pipeline. It's only effective if you do multiple passes over the same geometry. This extension was initially added at Carmack's request, but vendors weren't too hot about it. It was never really finalized, stayed very experimental. Today, I would consider it as totally obsolete. It's a very problematic extension to implement on modern pipelines, and is often just implemented as a hack. I wouldn't touch it with a ten foot pole.


[edited by - Yann L on May 30, 2004 4:48:45 PM]

Share this post


Link to post
Share on other sites
On my own computer (AMD barton 2600+ and ATI 9700 pro using catalyst 4.5) the VBO''s are waay faster on a 67k poly mesh, but on my friends nVidia Gf4400 with the latest Detonators, i''d get 560fps using lists and 460 using VBOs.

My Object3DS has a list of meshes, each containing it''s own indexbuffer, and when i''m loading the file, i compiling one texcoord buffer and one vertexbuffer, and looping through the mesh list and set the approperiate index buffer using glBindBufferARB and renders the amount of polygons with glDrawRangeElements(..., numVerts, numIndices)
However i''m using GL_UNSIGNED_INT and the times i''ve changed it to GL_UNSIGNED_SHORT no speed increases were gaines.

Share this post


Link to post
Share on other sites
quote:
Original post by Metus
On my own computer (AMD barton 2600+ and ATI 9700 pro using catalyst 4.5) the VBO''s are waay faster on a 67k poly mesh, but on my friends nVidia Gf4400 with the latest Detonators, i''d get 560fps using lists and 460 using VBOs.


NVidia took a long time to get VBOs working as they should, and they still aren''t 100% right. You can sometimes get very weird and unexpected results, that''s why our engine uses VAR instead of VBO by default on NV hardware. Make sure that your friend installed the most recent drivers from nvidia. Old drivers will almost certainly mess up VBOs in various ways.

quote:
Original post by Metus
My Object3DS has a list of meshes, each containing it''s own indexbuffer, and when i''m loading the file, i compiling one texcoord buffer and one vertexbuffer, and looping through the mesh list and set the approperiate index buffer using glBindBufferARB and renders the amount of polygons with glDrawRangeElements(..., numVerts, numIndices)


Wait a second there, I didn''t completely get you mean. What kind of buffers exactly are you generating at loadtime, and how are you rendering them every frame ?

quote:
Original post by Metus
However i''m using GL_UNSIGNED_INT and the times i''ve changed it to GL_UNSIGNED_SHORT no speed increases were gaines.


Won''t make a speed difference, unless you''re VRAM bandwidth limited. Although ushorts take half the memory of uints, one should keep that in mind.

Share this post


Link to post
Share on other sites
alright, i''ll try to go over it again:
each Object3DS has a list of meshes or whatever you can call it. Each of these meshes has an individual vertexlist and texcoordlist that''ll be filled when the model is loading.

Before the engine enters the loop, i''ll copy all of these meshes vertices and texcoords into two separate buffers so that i''ll only have to bind two buffers per model instead of two buffers per mesh.

Them, while looping through the meshes, i''ll only have to bind the indexbuffer associated to the right mesh...

Share this post


Link to post
Share on other sites
now i''m pretty depressed; my hard work with VBO''s are useless because my display lists are faster in 90 percent of the cases...

my models are based on several meshes that can be separatly textured and to simplify the texturing stage, i''ve got one VBO per mesh (in my desired game, each mesh is AT LEAST 5000 polygons) and i generate one list per mesh as well.

pseudo:

init:
-----

for_each_mesh
generate_vbo(num_vertices);
generate_indices(num_polys * 3);

glNewList
glBegin
for_each_mesh_face
for_each_face_vertex
glVertex3f(current_face_index[0], current_face_index[1], current_face_idnex[2]
glEnd
glEndList

render:
-----
for_each_mesh
if GL_ARB_vertex_buffer_object
bind_mesh_vbo
bind_mesh_indices
DrawRangeElements
else
glCallList(mesh_list)


i''ve tried to compile each mesh''s vertices into one huge buffer, and just bind the current mesh''s indexbuffer, but that didn''t work very well either.. can it just be that vertex lists are faster?

is there any way to calculate the ACTUAL datasize that''s being transferred on the bus to the GPU?

Share this post


Link to post
Share on other sites
I''m also a bit curious about the whole thing... When i use DL i have pretty good performance, but also some glBegin/glEnd artifacts (edges) ... When i use standard VA, performance is okay, and when i call my VBO everything works superb.

That''s the shiny side of the medal. In our engine at work, we use standard DL. I have to implement my own code now into our engine. I did some tests with calling a VA from DL, it seems to work. Maybe someone also tried this?

Calling a VBO in a display list seems to fail (i get an exception... no time to trace that :/)

So maybe someone can make a summary?

I think it can be compared like this:

immediate < vertex array < display list < display list with VA < VBO

okay.... VBOs are the issue. But i think we can expect a more and more stable version of this extension.

-Thomas




Share this post


Link to post
Share on other sites
yes, it is possible, as for performance gains i dont know, however it does allow for the graphics card to optermise the statechange code internaly so there is probably a slight increase at, lets face it, very little effort.

Tip: dont compile state change and geometery infomation into the same displaylist, apprently the driver can do a much better job at optermising it if you only have one or the other

Share this post


Link to post
Share on other sites
With respect to Vertex arrays within display lists... I have tried both vertex arrays and compiled vertex arrays in display lists (not VBOs yet) and found no performance improvement over plain old glbegin/glend in the display list. I can only assume that this means that the geometry from the vertex array is "inlined" into the display list so to speak. I can''t see any reason you''d really want to put a VBO in a display list, since VBOs are intended to allow you to put the geometry in the video memory... whereas display lists only might do this. Calling a VBO in a display list would probably cause the geomerty to be inlined in the display list (like a normal vertex array) and would provide either similar performance, or a slow down (I would assume). The only real answer is to try it :D

Share this post


Link to post
Share on other sites