Sign in to follow this  
Tree Penguin

Display List vs VBO?

Recommended Posts

Hi, i tested VBOs, VAs and Display lists for their speed. Display Lists appeared to be almost two times as fast as VBOs. Did i do something wrong (i thought VBOs would be faster)? I know the advantages of VBOs are the fast loading and low memory consumption, are there any others?

Share this post


Link to post
Share on other sites
Quote:
Original post by Tree Penguin
Did i do something wrong

Most probably, yes :)

Quote:

I know the advantages of VBOs are the fast loading and low memory consumption, are there any others?

They're supposed to be the fastest way to render geometry.

Post the code you used to set up the VBO. Also, your hardware and driver revision.

Share this post


Link to post
Share on other sites
Ok, the framerates:

glVertex3f and glTexCoord2d calls: 2
Display List: 93
Vertex Arrays: 6
VBO: 54

That's when i render a 512*512 point heightmap (with a 512*512 size texture) using just simple triangles (so 511*511*3*2=1566726 vertices) in windowed mode, 640x480x32, on an ATI Radeon 9600XT 128MB.
All triangles are visible (and take up about two thirds of the screen), culling is off, lighting is off.

#############################################################
VBO drawing code:

glEnableClientState( GL_VERTEX_ARRAY );
if(dataTexCoords)glEnableClientState( GL_TEXTURE_COORD_ARRAY );

glBindBufferARB( GL_ARRAY_BUFFER_ARB, VerticesID );
glVertexPointer( 3, GL_FLOAT, 0, (char *) NULL );

if(dataTexCoords){
glBindBufferARB( GL_ARRAY_BUFFER_ARB, TexCoordsID );
glTexCoordPointer( 2, GL_FLOAT, 0, (char *) NULL );
}

glDrawArrays( DrawingPrimitive, 0, nVertices );

glDisableClientState( GL_VERTEX_ARRAY );
if(dataTexCoords)glDisableClientState( GL_TEXTURE_COORD_ARRAY );

#############################################################
VBO setup code:

glGenBuffersARB( 1, &VerticesID );
glBindBufferARB( GL_ARRAY_BUFFER_ARB, VerticesID );
glBufferDataARB( GL_ARRAY_BUFFER_ARB, nVertices*3*sizeof(float), dataVertices, GL_STATIC_DRAW_ARB );

if(dataTexCoords){
glGenBuffersARB( 1, &TexCoordsID );
glBindBufferARB( GL_ARRAY_BUFFER_ARB, TexCoordsID );
glBufferDataARB( GL_ARRAY_BUFFER_ARB, nVertices*2*sizeof(float), dataTexCoords, GL_STATIC_DRAW_ARB );
}

############################################################

I might even have copy pasted these lines out of the NeHe VBO tutorial.

BTW, DrawingPrimitive is an unsigned int variable that holds values like GL_TRIANGLES (in this case) or GL_TRIANGLE_STRIP.

Share this post


Link to post
Share on other sites
Quote:
Original post by Ysaneya
Switch to glDrawElements and use an IBO - i'm guessing the DL is faster because it's been optimized.

Y.

IBO? Index Buffer Object or something like that?

The only way the DL could be faster than a VBO is the fact that texcoords and vertices are stored together (instead of jumping back and forth it can just go through all the data without large jumps. I cannot think of another reason (or maybe that the VBOs are for some reason very badly used by the latest ATI drivers but i don't think that's the case :]).

Share this post


Link to post
Share on other sites
>>511*511*3*2=1566726 vertices<<

this is way to much to draw in one call.
try

glGetIntegerv( GL_MAX_ELEMENTS_INDICES, &max_elements_indices );
glGetIntegerv( GL_MAX_ELEMENTS_VERTICES, &max_elements_vertices );

to give u an idea of the figures u whould be calling

Share this post


Link to post
Share on other sites
I tried it on my Ti4200 (i'm back home) and it resulted in 1243784 (vertices) so i am just a little over the limit (or maybe under it at the radeon).
Anyway i tried display lists, VAs and VBOs at the ti and i got these averages, all measured several times and all had the same results (these are averages which might not be completely right, but the order is right):

DL: 120
VA: 115
VBO: 110

VSync is off and i used the NeHe tutorial 45 unmodified, i only added display list drawing code.

I am a little confused, everyone is telling me VBOs are way faster than display lists, if a card supports it. I have never had that same experience, not even the NeHe tutorial which stated the VBOs added the FPS boost everyone dreamed of. It did when compared to VAs on the radeon but it still looks strange to me that a simple DL beats it, especially in the way it did at the radeon.

I guess the VBOs just hate me [bawling].

Any suggestions/questions/corrections/blessings are welcome.

Share this post


Link to post
Share on other sites
>>Any suggestions/questions/corrections/blessings are welcome.<<

read my last post,
also check out the pdf's from nvidia.
VBO's are morre difficult to set up correctl;y than DL.

also a lot of the code at nehe's site is just plain bad (ie dont take it as gospel)

Share this post


Link to post
Share on other sites
Quote:
Original post by Tree Penguin
DL: 120
VA: 115
VBO: 110

This also indicates that you are not transfer limited. You get almost same performance when using data from VRAM than form normal RAM. Render 10x as much and then post results. Also make sure you are not fillrate limited.

Share this post


Link to post
Share on other sites
VBOs are surely a nice extension, but their implementation changes from driver to driver, and so the performance...

VBOs are a good way if you want to store your geometry on the card, so you can save a lot in AGP transfers.

i played around with them for some time now... what i experienced: you could gather a lot of performance when you optimize your data for specific vertex caches. caching is everything... also try to order your data as triangle strips, this will also result in a great speedup.

but VBOs often totally mess up the performance when switching to another graphics board, driver etc.

if you have a total static geometry i would recommend (for now) another solution:

use a compiled display list together with standard vertex arrays (with optimizations mentioned above) and you will have a good + stable performance. (i'm really satisfied with it in our project)

------------
speculation: displaylists + VA are handled driver-internal as VBOs. i dunno if this is right, just my idea (due to the nice performance). another plus: you don't have to mess around with vbo extensions and fallback code.
------------

my 2c,

thomas

Share this post


Link to post
Share on other sites
Ok, thanks for the replies!

Rendering 20 times as much geometry still had the same results (DL was still a little faster than VBOs), i was fillrate limited, i assume scaling what you draw to about 10x10 pixels fixes that.

VAs inside DLs was just as fast as glVertex and glTexCoord calls inside the DL (i think that must be my gfx card, i wil try it on the radeon later this week).

Anyway, loading a VBO is way faster than loading a DL containing the same data, so i'll stick with VBO for certain purposes i guess.

I will look up the NVidia papers and some others probably too.

Thanks for your help everyone.

Share this post


Link to post
Share on other sites
Quote:
Original post by Tree Penguin
VAs inside DLs was just as fast as glVertex and glTexCoord calls inside the DL (i think that must be my gfx card, i wil try it on the radeon later this week).


I haven't optimized my engine with DL's yet, so I might be wrong. But I could swear that glVertexPointer and such VA calls aren't compiled into a DL.

Share this post


Link to post
Share on other sites
Quote:
Original post by okonomiyaki
Quote:
Original post by Tree Penguin
VAs inside DLs was just as fast as glVertex and glTexCoord calls inside the DL (i think that must be my gfx card, i wil try it on the radeon later this week).


I haven't optimized my engine with DL's yet, so I might be wrong. But I could swear that glVertexPointer and such VA calls aren't compiled into a DL.


I tried clearing (i set every value to 0.0f, to make sure the driver can't use that data anymore) and deleting the data after compiling the display list and it works fine so either the driver made a copy of the data in system memory (i don't think so) or it's placed in VRAM.

Share this post


Link to post
Share on other sites
Quote:

I tried clearing (i set every value to 0.0f, to make sure the driver can't use that data anymore) and deleting the data after compiling the display list and it works fine so either the driver made a copy of the data in system memory (i don't think so) or it's placed in VRAM.


You may be right, like I said, I haven't played around with DL's extensively yet. It's strange how that still worked. This is straight from the red book:

Quote:

Certain commands, when called while compiling a display list, are not compiled
into the display list but are executed immediately. These are: GenLists,
DeleteLists, FeedbackBuffer, SelectBuffer, RenderMode, ColorPointer, Fog-
CoordPointer, EdgeFlagPointer, IndexPointer, NormalPointer, TexCoord-
Pointer, SecondaryColorPointer, VertexPointer, ClientActiveTexture, InterleavedArrays,
EnableClientState, DisableClientState, PushClientAttrib, Pop-
ClientAttrib, ReadPixels, PixelStore, GenTextures, DeleteTextures, AreTexturesResident,
GenQueries, DeleteQueries, BindBuffer, DeleteBuffers, Gen-
Buffers, BufferData, BufferSubData, MapBuffer, UnmapBuffer, Flush, Finish,
as well as all of the Get and Is commands (see Chapter 6).


I would like to clear this up so that I can understand how and where to implement DL's.

Share this post


Link to post
Share on other sites
Strange... i fear it's driver specific. If so, that sucks, that would mean to get optimal performance you should test every possible way and see what's fastest, as it might save you 50%.

Share this post


Link to post
Share on other sites
Quote:
I haven't optimized my engine with DL's yet, so I might be wrong. But I could swear that glVertexPointer and such VA calls aren't compiled into a DL.

It's compiled when you draw the VA using glDrawArrays or other "draw" calls. At wich time the data is copied, and the DL will not use the pointer after compilation.

In theory DL could be faster than VBO's becuase it can do whatever optimizations it want's at compile time. It really depends on how good the drivers are.

Share this post


Link to post
Share on other sites
Quote:

It's compiled when you draw the VA using glDrawArrays or other "draw" calls. At wich time the data is copied, and the DL will not use the pointer after compilation.


Ah, that makes sense. Cool. So I can upload data without VBO's.
I think I'll implement VBO's and compiled vertex arrays, and let the user choose which (VBO as default, because I still have more faith in them). But if the user finds better performance with vertex arrays, well, I don't want to stop them from choosing it.

If you delete a compiled DL, does that mean that it deletes all the copied data too?

Share this post


Link to post
Share on other sites
Yes, all data is compiled into the display list so when you delete the dl the data is deleted too.

I think letting a gamer decide to use VBOs or VAs in DLs will be unwise, i think most of them won't even know what they are. I think you should choose the fastest, checking which one is the fastest at startup.

Share this post


Link to post
Share on other sites
Quote:
Original post by Tree Penguin
Thank you! That changed my view on VBOs entirely :).


Indeed, you can use VBO's do construct a cache system in video memory, a very useful system which you cannot do with display lists...

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this