# Is Vertex Buffer extension "much more" faster than Display Lists?

Well, I started learning OpenGL again after a few months of being stand by, and until now I''m using only display lists. I''m following the nehe tutorials, and they seem to use this method only. I was wondering if it is worth to expend the next weeks learning extensions and (if it is supported by my video card) vertex buffers in particular. I''m currently working on a TNT2 (!!), and the performance I get is quite poor. (my fault, many OGL games run quite well) For example, if I create a Display List for drawing a cube and I call this Display List 100 times from a FOR loop (without textures, without lights), my FPS drops down like 70%. I guess this is not the best performnce test one can do. But, for the moment, I couldn''t come up with something better. I''m using SDL to initialize the video mode etc. Wich method you use to draw your stuff in OpenGL? I''m pretty clueless about OpenGL and I think it would be cool if some of you could discuss something about the subject. Thanks in advance.

I doubt you have the VBO extension on a TNT2, since it doesn''t perform geometry calculations on the card. If you do have it, it''s emulated anyway, so you won''t see any benefit over standard vertex arrays.

You might want to try vertex arrays though. I almost always find indexed VAs are faster than display lists. You could even combine the two, although you won''t get much benefit.

The method I use is a general purpose Vertex Array class which uses VBO if available or falls back to vertex arrays otherwise. Since VBO uses the existing vertex arrays commands, this is relatively easy to do.

Here's an interesting thread from OpenGL.org:

http://www.opengl.org/discussion_boards/ubb/Forum3/HTML/010685.html

It's not totally about VBO, so I'll quote/lightly-paraphrase the relevant text:

quote:
Originally posted by cass:
[VBO is] a better interface for software T&L than compiled vertex arrays, and pretty much all the other attempts to solve the "vertex buffer" problem before.

So while VBO isn't necessarily "hardware accelerated" on all platforms, it should still provide opportunity for "better acceleration" over other APIs for vertex array management.

Thanks -
Cass

The guy works at NVidia, so I think it's safe to assume he knows what he's talking about. Anyways, my point is that there does appear to be some benefits in using VBO, even when it is just emulated in software.

[edited by - Ostsol on November 11, 2003 6:56:57 PM]

Interesting. Although, I''d be really surprised if there''s a significant difference in performance compared to VAs.

from my expierence i can tell you, VAs are faster! and another advantage is that if you have created a vertex manager, and if you use indexed VAs, you can save a large amount of memory! (two points who are the same are only included once)

AP : "if you use indexed VAs, you can save a large amount of memory! (two points who are the same are only included once)" .... and what would in VBO case be different?

You should never let your fears become the boundaries of your dreams.

Ok, I tried va''s and, at least, they look easier to code.

Can I store the verices of all my objects into one va and lock only that one array once per frame?

quote:
Original post by Anonymous Poster
from my expierence i can tell you, VAs are faster! and another advantage is that if you have created a vertex manager, and if you use indexed VAs, you can save a large amount of memory! (two points who are the same are only included once)

VAs are never going to be faster than VBO on a T&L card unless you're doing something very wrong, or possibly you're updating the geometry every frame (ie for skinning).

Here's my experience, from a report on skinning and mesh rendering performance that I wrote recently. The bottom 3 rows are without animation or skinning, so they're the most applicable to general mesh rendering. BTW, the skinning method I'm using here isn't exactly optimal yet, so the figures for that are somewhat distorted. All these tests were done with a Radeon 9700 pro.

Anim-   Smooth Skinning	Render  FPS     Tris/sec        Time /ation	motion          Method			        Frame (MS)Yes	Yes	Yes	IM	27.77	1,329,405	36.01Yes	Yes	Yes	VA	46.8	2,240,410	21.37Yes	Yes	Yes	VBO	47.11	2,255,250	21.23Yes	No	Yes	IM	28.91	1,383,980	34.59Yes	No	Yes	VA	48.4	2,317,005	20.66Yes	No	Yes	VBO	49.49	2,369,185	20.21Yes	Yes	No	IM	53.39	2,555,886	18.73Yes	Yes	No	VA	119.23	5,707,779	8.39Yes	Yes	No	VBO	170.99	8,185,633	5.85No	No	No	IM	63.08	3,019,766	15.85No	No	No	VA	171.12	8,191,857	5.84No	No	No	VBO	300.01	14,362,079	3.33

Abbreviations: IM: Immediate mode; VA: Vertex Arrays; VBO: ARB_vertex_buffer_object

[edit: fixed column headers]

[edited by - benjamin bunny on November 13, 2003 7:31:08 PM]

[edited by - benjamin bunny on November 13, 2003 7:32:13 PM]

be intresting to see the kinda fps you get when using VBOs and the matrix palette stuff (the proper name escapes me atm) as you could render each bone segment with a different call and matrix to allow for animation, kinda the best of both worlds in a way, animation with the vertex data locked on teh card
(this is assuming i''ve understood the extension right, hehe)

For some reason VBO''s work a lot slower on my machine than vertex arrays :/ must be something I''m doing wrong.

James Simmons
MindEngine Development
http://medev.sourceforge.net

quote:
Original post by benjamin bunny
That looks good. Which configuration has the computer in which you ran that test?

The computer is an Athlon 2100+ (1.733GHz), 512MB 333MHz DDR-RAM, Radeon 9700 Pro.

A few more details I should probably have mentioned: lighting was enabled, with a single light; the vertex data included colours, normals and texture coords. In the test I rendered 64 meshes at a time, with 748 triangles in each, and measured the frame rate over 100 frames. In the tests with skinning enabled, the vertex normals and positions for each mesh were updated every frame, which is why VBO wasn't that much quicker than VAs. Animation refers to bone animation. Smooth motion is an option to smoothly interpolate between frames. The executable was a release build with VC.net 2003 with the O7 command line option set (optimise for P4 and above).

I think that about covers it

____________________________________________________________
www.elf-stone.com | Automated GL Extension Loading: GLee 2.00 for Win32 and Linux

[edited by - benjamin bunny on November 13, 2003 12:15:26 AM]

quote:
Original post by _the_phantom_
be intresting to see the kinda fps you get when using VBOs and the matrix palette stuff (the proper name escapes me atm) as you could render each bone segment with a different call and matrix to allow for animation, kinda the best of both worlds in a way, animation with the vertex data locked on teh card
(this is assuming i've understood the extension right, hehe)

The vertices are affected by multiple bones, so they have to be transformed with multiple bone matrices; the only option is to manually update the vertex positions, which is precisely where VBO falls down. I'm planning on putting all the skinning code into a vertex shader eventually, which should solve this problem.

____________________________________________________________
www.elf-stone.com | Automated GL Extension Loading: GLee 2.00 for Win32 and Linux

[edited by - benjamin bunny on November 13, 2003 12:25:30 AM]

quote:
Original post by benjamin bunny
quote:
Original post by Anonymous Poster
from my expierence i can tell you, VAs are faster! and another advantage is that if you have created a vertex manager, and if you use indexed VAs, you can save a large amount of memory! (two points who are the same are only included once)

VAs are never going to be faster than VBO on a T&L card unless you''re doing something very wrong, or possibly you''re updating the geometry every frame (ie for skinning).

Here''s my experience, from a report on skinning and mesh rendering performance that I wrote recently. The bottom 3 rows are without animation or skinning, so they''re the most applicable to general mesh rendering. BTW, the skinning method I''m using here isn''t exactly optimal yet, so the figures for that are somewhat distorted. All these tests were done with a Radeon 9700 pro.

[edit: fixed column headers]

[edited by - benjamin bunny on November 13, 2003 7:31:08 PM]

[edited by - benjamin bunny on November 13, 2003 7:32:13 PM]

Something isn''t right about that.. I''m running an athlon xp 1600, radeon 9200, and drawing my terrain + models I am hitting over 20M triangles per second... using just vertex arrays, so no clue how you''re getting such low numbers with normal VA''s and VBO''s. Actually, I am using lock/unlock with the vertex arrays. It falls back to not using lock/unlock if not available, I haven''t had a chance to implement VBO''s yet (I think it''s finally implemented for my card in the latest drivers, i have to check though, last time I looked it wasn''t there).

first: it seems some drivers have weird vbo issues resulting in bad performance. if you either allocate too much or too little memory (depending on card and driver) you might just get system memory. you could for example try to allocate less than 6mb of video memory or more than 12 (even if you dont need it, just for testing).

about skinning. it required vertex programs to let me see an efficient solution to the problem i had with it (meaning: i do absolutely NOT want to touch the vertices of the skin myself). the ability of storing 12 matrices/bones should easily let you do the whole skinning in the vertex shader, though having a lot of work to do for each vertex i would definitely make sure to use the cache as good as possible. more complex models would require multiple calls and changing the matrices.

btw. if you use lock/unlock and do that maybe only once it might very well be that the driver is copying it to video memory which might be more or less as fast as vbo, depending on how they implemented it. anyways, you are not using "just vertex arrays".

I was thinking, does anything actually prevent an opengl implementation from making a VBO behind the scenes when you compile a display list?

quote:
Original post by Ready4Dis
Something isn't right about that.. I'm running an athlon xp 1600, radeon 9200, and drawing my terrain + models I am hitting over 20M triangles per second... using just vertex arrays, so no clue how you're getting such low numbers with normal VA's and VBO's. Actually, I am using lock/unlock with the vertex arrays. It falls back to not using lock/unlock if not available, I haven't had a chance to implement VBO's yet (I think it's finally implemented for my card in the latest drivers, i have to check though, last time I looked it wasn't there).

I suspect it's because I'm only sending 300 vertices per batch, which isn't particularly efficient with vertex arrays or VBO. I get much higher numbers with terrain, where I send data in blocks of at least 1039 vertices. Bear in mind also that lighting makes a big difference to frame rate.

quote:

about skinning. it required vertex programs to let me see an efficient solution to the problem i had with it (meaning: i do absolutely NOT want to touch the vertices of the skin myself). the ability of storing 12 matrices/bones should easily let you do the whole skinning in the vertex shader, though having a lot of work to do for each vertex i would definitely make sure to use the cache as good as possible. more complex models would require multiple calls and changing the matrices.

That's pretty much the solution I've come up with. My reason for implementing it in software first is as a fall-back where ARB_VP is not available.

____________________________________________________________
www.elf-stone.com | Automated GL Extension Loading: GLee 2.00 for Win32 and Linux

[edited by - benjamin bunny on November 14, 2003 3:28:22 PM]

i admit im a feature bitch. if a solutions doesnt look sleek and streamlined i put it aside until the right feature or extension allows to do it in the way i was hoping for. in other words, i would be to lazy to write a fall back, at least as long as im not paid for it ,)

I''m trying to get into the habit of coding fallbacks for in case I''m not working on my own computer. I always have my stuff on an FTP server so I can work on it at school. Unfortunately, integrated Intel graphics or TNT2s. . . Yeah, you get the picture.

Well it''s not related to the topic but since there is some mention of skinning, i would like to ask exactly how you go about doing it.

You got the bones, affected vertices and some matrices thrown up by 3d modelling software like Max.

1)Do you calculate the new position of the vertices and then assemble them all into 1 array and pass it to the card? In this case how do you calculate the new position with just martrices? Do you extract the rotation/translation values from the matrice? If you do, how do you extract it?(Sorry im not good with Matrice math)

2) Do you render each bone then apply the matrice to the ModelView matrix then render another bone, etc..? If you do it this way, how do you calculate vertices being affected by more than 1 bone? I believe this method is faster since you only apply the matrice once per bone. But how will you do interpolation using this method?

I had this problem, I tried the second way and I guess gl matrix calls are illegal between begin and ends. So I am no pro, but I''m pritty sure (and what im doing) what you you do is compile a vertex array manualy

vertexout = (vertexin * bonematrix1 * boneweight1) + (vertexin * bonematrix2 * boneweight2) .... if you have more affecting bones.

and then render that array. It seems so slow to have to process every vertex individualy, but I guess its the way to do it.

you might want to look into the GL_ARB_Vertex_blend extension (ati have a demo of its useage) as this does what you want but basicaly in hardware
Having had a quick shifty at the code all the software work you have to do is lerp the bones, ones thats done you use it to setup mulitple modelview matries for each bone as required.

http://www.ati.com/developer/vertexblend.html

this thread finally made me play around with it. matrix palette, weights, blending.. one would think everything you need is there but of course for some reason nvidia decided that their driver will not support vertex blending and matrix palette anymore, so im kind of clueless how to dynamically select the right program matrix from within the shader. maybe its time for a new card.

quote:

1)Do you calculate the new position of the vertices and then assemble them all into 1 array and pass it to the card? In this case how do you calculate the new position with just martrices? Do you extract the rotation/translation values from the matrice? If you do, how do you extract it?(Sorry im not good with Matrice math)

Each bone contains a 4x4 local matrix which describes its position and rotation relative to its parent (well actually it uses a quaternion and position vector, but that''s not relevant), and an absolute matrix which describes its global position. The bones (and other rigid meshes) are animated by interpolating between positions and rotations which are sampled at about 30fps.

Bones are attached to a skin, a skin being a single mesh, with each vertex affected by one or more bones'' absolute position/rotation matrices. When the bones are updated, the vertices affected by that bone are updated too.

If you want to know specifics about matrices and how they''re used to transform shapes, there''s plenty of material on the web and in maths text books. The Matrix and Quaternion FAQ is a good place to start.

quote:

vertexout = (vertexin * bonematrix1 * boneweight1) + (vertexin * bonematrix2 * boneweight2) .... if you have more affecting bones.

Almost. Each vertex has an offset vector for each bone affecting it, which describes the local position of the vertex in relation to the bone. You''d transform that offset vector by the absolute bone matrix.

quote:
Original post by Trienco
first: it seems some drivers have weird vbo issues resulting in bad performance. if you either allocate too much or too little memory (depending on card and driver) you might just get system memory. you could for example try to allocate less than 6mb of video memory or more than 12 (even if you dont need it, just for testing).

about skinning. it required vertex programs to let me see an efficient solution to the problem i had with it (meaning: i do absolutely NOT want to touch the vertices of the skin myself). the ability of storing 12 matrices/bones should easily let you do the whole skinning in the vertex shader, though having a lot of work to do for each vertex i would definitely make sure to use the cache as good as possible. more complex models would require multiple calls and changing the matrices.

btw. if you use lock/unlock and do that maybe only once it might very well be that the driver is copying it to video memory which might be more or less as fast as vbo, depending on how they implemented it. anyways, you are not using "just vertex arrays".

Yes, but lock/unlock only gave me about a 10fps gain in speed, which still puts my app well over 19m/s with just normal old vertex arrays (triangle strips of course).