• Advertisement
Sign in to follow this  

VBO confusion (please test!)

This topic is 4313 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello, Firstly, I have used VBO's (succesfully, in performance terms) before. However today I went to start to write an obj loader and simultaneously compare VBO performance with vertex arrays and immediate mode and I can't find what I've done wrong but VBOs are slower than both vertex arrays and immediate mode. I would be very grateful if you could please download this and check if it's the same for you. There's a windows executable + glut dll and makefile (called Makefile.win) and also a Linux makefile (called Makefile). And if it's as I fear the same for you as me, could I request you to please see if there is some silly mistake in the usage (all in main.cpp) that you can spot ... thanks a million! [smile] The mesh is a simple one - 1980 vertices, 3872 triangles. I am using indexed arrays and drawing with glDrawElements. You can press the keyboard keys: 1 - Use vertex arrays 2 - Use VBO 3 - Use immediate mode 4 - Use a precompiled display list /edit: sorry but I haven't included much error checking, if it crashes on choosing VBOs it probably can't find the buffer objects functions on the video card driver ... /edit2: added option 4 [Edited by - deavik on April 8, 2006 6:24:22 AM]

Share this post


Link to post
Share on other sites
Advertisement
Nope, for me (Linux/fglrx/FireGL X2-256) VBO is the fastest at a tad over 1400FPS. Vertex arrays come second at ~1000fps and immediate mode is, not surprisingly, slowest at ~400fps.

Btw, with that Makefile it doesn't compile. Add -lGL and -lGLU to the linker options.

Share this post


Link to post
Share on other sites
With my Radeon 9700 and AMD64 3200, VA:2450fps, VBO:2550, and IM:1150. VBO is slightly faster than VA about 100 fps.

I don't see any mistakes on your VBO codes, but try to load a high poly model. You may see different results.

Share this post


Link to post
Share on other sites
Hi,
my results:

1 - VA 1360 fps
2 -VBO 1320 fps
3 - IM 1280 fps

On a Nvidia 6600, P4-3.2Ghz, WinXp.

The VBOs would be by far the fastest with larger Meshes ranging from 50k to 1Mio Triangels. However i could'nt test with your sample code, because it crashed on all my Obj-files.

Steve

Share this post


Link to post
Share on other sites
Quote:
Original post by SteveXP
Hi,
my results:

1 - VA 1360 fps
2 -VBO 1320 fps
3 - IM 1280 fps

On a Nvidia 6600, P4-3.2Ghz, WinXp.

The VBOs would be by far the fastest with larger Meshes ranging from 50k to 1Mio Triangels. However i could'nt test with your sample code, because it crashed on all my Obj-files.

Steve

Yes I've just started working with it it's far from complete, but it shouldn't have crashed ... I don't know I'll have to check with different obj files. Since all of you are saying that with larger meshes it is normally found that VBOs outstrip the others, I'm relieved!

I was very confused with the results I was getting, maybe it's because of my card. I'm using moderately recent drivers (78.01 on Windows, 77.xx on Linux). With the same code I get ~230 fps (vertex array and immediate) and ~170 fps (vbo) on a Geforce4 MX.

Thanks very much for trying out!

@oggialli: That was the Makefile I used to compile the code, but maybe the glut supplied by my distro does something to link with -lGL and -lGLU.

@songho: As I wrote before, I think I'll have to take your word for it (which is good enough for me [smile]). I had earlier tried a ~15K poly model with that code, immediate mode had been fastest. :p

Share this post


Link to post
Share on other sites
Just out of curiosity, what would be the speed gain (loss?) if you used pre-compiled DisplayLists to render your static models?

Share this post


Link to post
Share on other sites
Quote:
Original post by deavik

@oggialli: That was the Makefile I used to compile the code, but maybe the glut supplied by my distro does something to link with -lGL and -lGLU.


Might be. For reference, I'm using Gentoo with glut-3.7.1 (not freeglut).

Share this post


Link to post
Share on other sites
Quote:
Original post by HolyFish
Just out of curiosity, what would be the speed gain (loss?) if you used pre-compiled DisplayLists to render your static models?

Good question, I did that on my computer and it was a little faster than the VBO, but VA / Immediate was still faster. I decided it was time to try with a higher-poly model (17466 vertices, 15488 triangles) and heaven help me - immediate > VA > DL > VBO! If anyone wants to try, the download is here. Because of the time it takes to detect shared vertices from the Wavefront OBJ (for glDrawElements) it takes ~5 seconds to load that model. Press keyboard key

4 - Display List

@oggialli: Ubuntu 5.10, freeglut 3.something ...

@Myopic Rhino: Thanks for trying it out and posting, but (as with some of the others) I guess it's quite a shame to have tested with such a low complexity model. [wink]

/edit: @Dave, I have question. Since I don't plan to change my video card any time very soon, can you suggest what is causing the slowdown for VBOs on my hardware and is there anything I can do to improve this performance? As I wrote above it's a PCI Geforce4 MX 440, with 78.01 windows drivers (77.xx Linux drivers which give almost identical performance).

/edit2: I was just reading through the posts, and for everyone with an nVidia card the performance is almost equal for the 3 modes, whereas for everyone with an ATi, VBO >> VA >> Immediate. Hmmm... interesting!

[Edited by - deavik on April 7, 2006 8:14:27 PM]

Share this post


Link to post
Share on other sites
For your newest version, I get 60 FPS across the board.

AMD 3200+ @2.2GHz
256MB Geforce 6800 GT (PCIe)
1GB RAM

Share this post


Link to post
Share on other sites
Results form my PC (WinXP, A64 @ 2500MHz, 6600gt, 1GB RAM)

1 - VA 950 fps
2 -VBO 2100fps
3 - IM 450 fps

That seems really odd that VBOs are that much faster...


Quote:
Original post by JamesKilton
For your newest version, I get 60 FPS across the board.


I had to force vsync off in the drivers.

Share this post


Link to post
Share on other sites
Quote:
Original post by Expandable
A64 3500+, 6800GT@FW 84.20

Vertex Arrays: 930
Immediate: 470
VBOs: 2720 ;o)

Woohoo! The people who posted near the top were spot on - with a more complex model, nVidia's VBO implementation really rocks! Thanks also to Sr Guapo and James Kilton.

Plus I guess something good comes off everything - I'll keep this app as a VBO vs the rest benchmark (not that it's definitive but still ...)

Now I just have to figure out what on earth is wrong with my card ...

Share this post


Link to post
Share on other sites
By the way, glDrawRangeElements should be faster... at least it was the last time I experimented with VBOs.

Share this post


Link to post
Share on other sites
I tried replacing DrawElements with DrawRangeElements and at least on monkey2.obj on my system there was no performance difference whatsoever.

Share this post


Link to post
Share on other sites
Really? Well, maybe it was a driver bug then. glDrawRangeElements was twice as fast for me as glDrawElements.

Share this post


Link to post
Share on other sites
@Expandable: You've also read the nVidia "Using VBOs" paper haven't you? [wink] The reason I didn't use DrawRangeElements there is that you need to load its proc address for Windows, but not for Linux. I was getting some weird crashes because of that (before I decided to use GLee) so I dropped it out.

Quote:
Original post by oggialli
I tried replacing DrawElements with DrawRangeElements and at least on monkey2.obj on my system there was no performance difference whatsoever.

Same for me for the VBOs ... did you use [0, tmp->vert_count-1] as the range (since all vertices are drawn)? DrawRangeElements somehow killed the regular Vertex arrays though, I don't know what's going on there. Performance drops by half (frames per second wise, not frame-rate wise).

There are other ways to improve performance, such as NvTriStrip. Using an optimized triangle strip as generated by it improved fps ~50% for me.

Share this post


Link to post
Share on other sites


IM: ~440fps
VAR:~785fps
VBO:~3150fps
DL: ~3200fps

nothing unusual :) (using the default window size)

Athlon 3200 / Quadro fx 4000 / Linux Kernel 2.6.11 / nv 87xx driver

Share this post


Link to post
Share on other sites
Quote:
Original post by deavik
Same for me for the VBOs ... did you use [0, tmp->vert_count-1] as the range (since all vertices are drawn)?


Yes, I did. I think that is exactly why the performance doesn't improve - we are drawing the whole of the VBO anyway. Now if we had many small batches in one large VBO, it would make a difference with proper drivers.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Hi Deavik,

I have exactly the same problem with my code and it is driving me crazy. VBO is supposed to be faster than any other mode especially for large chunks of data:

http://www.spec.org/gpc/opc.static/vbo_whitepaper.html

In my test case it is actually even 5 times slower than compiled display list (I have around 600k vertices).

I am not sure if this has to do something with the driver implementation, since that extension just recently - last summer I think, became available for my card - Radeon 9800 Pro.

Anyway, these are my results with your test:
VAr: ~230
VBO: ~2050
Imm: ~380
DiL: ~2080

Share this post


Link to post
Share on other sites
Quote:
Original post by Anonymous Poster
Hi Deavik,

I have exactly the same problem with my code and it is driving me crazy. VBO is supposed to be faster than any other mode especially for large chunks of data:

http://www.spec.org/gpc/opc.static/vbo_whitepaper.html

In my test case it is actually even 5 times slower than compiled display list (I have around 600k vertices).

I am not sure if this has to do something with the driver implementation, since that extension just recently - last summer I think, became available for my card - Radeon 9800 Pro.

Anyway, these are my results with your test:
VAr: ~230
VBO: ~2050
Imm: ~380
DiL: ~2080

Well, you must remember that DLs do some things that VBOs can't (for example precalculate a number of material cahnges). Apart from that, VBO should be just as fast (if not faster that is). Since you've already taken the trouble to download my little demo (thanks!), and also many people on this thread agreed that the usage is OK, you can compare it to your own code to see if there's anything silly you may have overlooked.

My demo is only STATIC_DRAW data though, but since you mention comparing with a Display List it seems like that's the case for you too ...

EDIT: also make sure you read NVidia's (or ATI's if they have it) performance docs / whitepapers related to VBO. There are some points which may matter a lot, such as using an indexed array, laying data out in a cache friendly pattern, etc.

Share this post


Link to post
Share on other sites
hello,
Radoen ATIX600 P4 2.5
VA 25x fps
VBO 12xx fps
IMM 44x fps
i guess my card is sick or something x.x;
cya,

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement