Jump to content
  • Advertisement
Sign in to follow this  
eldeann

loop of glVertex3fv calls and perf

This topic is 4862 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello, i'm displaying several large meshes (total is 1.000.000 triangles). The loop for sending data to the graphic card use glVertex3fv on each vertex, using triangles strips. I'm also using textures of high quality (2048*2048 on both ati or nvidia card). I know this is far from being the fastest way to achieve it, but it's done like that, yet... I'm running the prog on two machines : - a pretty old one with an ATI Fire GL 3100 (128 Mo) - a recent one with a NVidia Quadro FX 3400 (256 Mo de RAM) (which is one of the latest nvidia graphic card for workstation) The very strange point is that I get a slower fps on the second one... Do you guess any reason why the cutting-edge card is slower on this example ? One reason could be that the nvidia uses bigger textures but there is a check in the code to force texture to be 2048*2048 at max. thanks for help Adrien

Share this post


Link to post
Share on other sites
Advertisement
-First thing you should do is to convert intermediate mode (glVertex3fv) to vertex array and then to VBOs. The overhead of 1m function calls is killing your performance big-time.
-Try using compressed textures to further reduce needed bandwith

Share this post


Link to post
Share on other sites
Quote:
Original post by eldeann
I'm running the prog on two machines :
- a pretty old one with an ATI Fire GL 3100 (128 Mo)
- a recent one with a NVidia Quadro FX 3400 (256 Mo de RAM) (which is one of the latest nvidia graphic card for workstation)


are both machines identical in every way hardware spec wise?

Share this post


Link to post
Share on other sites
Older cards were never designed to have the VBO extensions to begin with, so they don't suffer as much with the inefficiency. Newer cards, however, practically assume that you will be using VBO, and suffer when you don't.

Share this post


Link to post
Share on other sites
i'll use vertex array and VBO as soon as possible, but it's inside a large program, and that needs careful design as displayed points can be filtered in real time...

yet, i'm just trying to explain why it's slower with a very expensive card...

after a few tests, it appears that the buttleneck is the glvertex calls, rather than on the texture.

i'm writing a little bench to test precisely.


Promit :
what you said could explain why a recent card is slower than a older one for a huge glvertex loop ?


and thanks much for these answers

Share this post


Link to post
Share on other sites
It could be that the FireGL card is optimized for CAD professional applications that will make use of glVertex calls more so than using VBO. The Nvidia card was probably designed for maximum speed in games. I would also check your driver settings and see what the default settings are for OpenGL (if they are available).

Share this post


Link to post
Share on other sites
Quote:
Original post by Codemonger
It could be that the FireGL card is optimized for CAD professional applications that will make use of glVertex calls more so than using VBO. The Nvidia card was probably designed for maximum speed in games.


A QuadroFX is similarly optimized.


Basically, the VBO extension came into existence pretty recently. Before that, it was considered entirely normal to use vertex arrays to send data, or even glVertex. The only disadvantage of using glVertex over vertex arrays in the early days was that glVertex cost more on the CPU, due to the extra calls. But the card and driver were designed to deal with one vertex at a time.

As modern GPU architectures have become massively parallel and cached, however, this has changed. Reading from main memory is more costly, and drivers perform all sorts of optimizations on VBOs to make things maximally efficient for the new cards, which are designed to use VBO. New cards, therefore, show much better performance when the data is cached in video RAM and is accessible very quickly and in parallel. These architectures do much better when they can read 6 (on a GF 6800) vertices at a time and process them all simultaneously. Older cards simply didn't do this...before the GeForce, all of the vertex processing was done in software, one vertex at a time.

Share this post


Link to post
Share on other sites
i've finished my benchmarks and launched it on a few machines (but not yet on the quadro and firegl cards), but results are very interesting.

I first did a loop of glvertex3fv and that gives :
(values are fps)
nb of points : radeon 9600 geforce 5700 geforce 6800
100.000 0.34 0.06 0.1
1.000.000 0.034 0.02 0.03
10.000.000 0.003 0.002 0.003

then i used VBO and the change is impressive :
nb of points : radeon 9600 geforce 5700 geforce 6800
100.000 81.59 62.54 76
1.000.000 22.21 12.78 20


so, i'll think about using VBO. But the strange points is that for a loop of glVertex3f, it seems that a radeon 9600 is faster than a geforce 6800 that is supposed to be much better.
And for two equivalent cards (radeon 9600 et geforce 5700), the ati is always faster for the loop of glVertex.
Hard to explain to someone that just replaced its radeon 9600 and bought an expensive geforce 6800... and gets the job slower...

btw, i checked the driver to be sure of what i'm saying, and that's recent ones from ati and nvidia.















Share this post


Link to post
Share on other sites
sorry for the aspect... this is a bit better


nb of points : *** radeon 9600 *** geforce 5700 *** geforce 6800
100.000 ********** 0.34 ************ 0.06 ************** 0.1
1.000.000 ******** 0.034 *********** 0.02 ************* 0.03
10.000.000 ******* 0.003 *********** 0.002 ************ 0.003

then i used VBO and the change is impressive :
nb of points : *** radeon 9600 *** geforce 5700 *** geforce 6800
100.000 *********** 81.59 *********** 62.54 ************ 76
1.000.000 ********* 22.21 ************12.78 ************* 20

not easy to guess array...

Share this post


Link to post
Share on other sites
Quote:
nb of points : radeon 9600 geforce 5700 geforce 6800
100.000 0.34 0.06 0.1
1.000.000 0.034 0.02 0.03
10.000.000 0.003 0.002 0.003

(btw youre using the . to distingwish both thousands and fractions)

1,000,000 * 0.034 = 34,000 tris/points a second in immediate mode this is WAY WAY to low
any halfdecent card can do 10+million/sec in immediate mode (same thing with the VBO)

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!