Ineffective use of VBO?
I ran your test app. I get 650 fps in Direct3D and 560 fps in OpenGL on NVidia 9600GT with newest drivers.
Thank you for your test.
According to what Yann L stated:
I believe that the difference 650 - 560 is not a satisfying for OpenGL. This is strongened by the fact that OGL renderer is using only 3 gl calls per one rendered object. It's roughly 80 gl calls per frame. I've also found in PIX that DX is doing much more effort by creating and releasing a lot of resources, about 2-3 per shader parameter change (this is done by Cg runtime)
According to what Yann L stated:
Quote:
You are definitely doing something that you shouldn't be doing somewhere. The large performance difference is not normal. In fact, from my own experience, NVidias OpenGL tends to be slightly (but consistently) faster than D3D9, if (and only if) you perform the exact same operations in both APIs.
I believe that the difference 650 - 560 is not a satisfying for OpenGL. This is strongened by the fact that OGL renderer is using only 3 gl calls per one rendered object. It's roughly 80 gl calls per frame. I've also found in PIX that DX is doing much more effort by creating and releasing a lot of resources, about 2-3 per shader parameter change (this is done by Cg runtime)
This whole situation is getting more and more interesting. I've managed to update my drivers with some newest beta drivers destined for notebooks. And the performance dropped for D3D9 from 350 to 210. OGL stays untouched.
I'm also wondering whether it's really a problem of OGL. I've just checked of my "normal" applications running on my renderer. It does quite a lot of pixel processing and renders one huge box (few faces). And I get 100 fps for D3D and 60-80 for OGL. I'm also not sure about the importance of this but... my simple PerformanceTest I'm testing all the time for barely 4 20k-faces boxes reached 999 fps for D3D and 450 fps for OGL. When I render nothing (to a small viewport) I get 9999 fps for D3D and 999 fps for OGL. Maybe I'm overreacting but shouldn't I get similar results for both APIs in such situation? (they both perform jest clearing of small viewport after all). Maybe the problem is somewhere in SDL and maybe I've configured something in wrong way?
I'm also wondering whether it's really a problem of OGL. I've just checked of my "normal" applications running on my renderer. It does quite a lot of pixel processing and renders one huge box (few faces). And I get 100 fps for D3D and 60-80 for OGL. I'm also not sure about the importance of this but... my simple PerformanceTest I'm testing all the time for barely 4 20k-faces boxes reached 999 fps for D3D and 450 fps for OGL. When I render nothing (to a small viewport) I get 9999 fps for D3D and 999 fps for OGL. Maybe I'm overreacting but shouldn't I get similar results for both APIs in such situation? (they both perform jest clearing of small viewport after all). Maybe the problem is somewhere in SDL and maybe I've configured something in wrong way?
Quote:Original post by maxestThat is a difference of well under a millisecond per frame - hardly something to worry about.
When I render nothing (to a small viewport) I get 9999 fps for D3D and 999 fps for OGL. Maybe I'm overreacting but shouldn't I get similar results for both APIs in such situation? (they both perform jest clearing of small viewport after all).
I also wouldn't be surprised if SDL managed to initialise a less-than-optimal OpenGL context - it is a pretty ancient code base, unless you are using the 1.3 branch.
The more I read of this, the more strongly I suspect Cg as the source of this problem. I'm not that familiar with Cg; could you simply have it return the shader program name after it's been linked so that you can use strictly GL calls for binding programs/attributes/uniforms as needed? It just seems like Cg is doing a lot of bounds checking and maybe name lookups that shouldn't be necessary on a per-frame basis (or at least, that's my guess as to what's going on).
>>In my first post I wrote I'm rendering to small viewport and in the one where I posted glIntercept's code you can clearly see I use 100x100 viewport, which is small enough to neglect pixel-processing.<<
sorry missed it(*)
though to truly disreguard pixel stuff use
glCullFace( GL_FRONT_AND_BACK ); // I believe data still gtes transformed
the reason this can influence things is the windowing stuff mightnt be the same between the two APIs, eg one might be rendering in high quality than the other
(*)though youve made a mistake I think, from a quick look at your code
CRenderer::setViewport(50,50,100,100);
CRenderer::clear(true, true, false, CVector3(0.5f, 0.5f, 0.5f)); <- this will clear the whole screen, not just the area of the viewport, to restrict the area cleared use glScissor
or do like I said in my other post use a small window
sorry missed it(*)
though to truly disreguard pixel stuff use
glCullFace( GL_FRONT_AND_BACK ); // I believe data still gtes transformed
the reason this can influence things is the windowing stuff mightnt be the same between the two APIs, eg one might be rendering in high quality than the other
(*)though youve made a mistake I think, from a quick look at your code
CRenderer::setViewport(50,50,100,100);
CRenderer::clear(true, true, false, CVector3(0.5f, 0.5f, 0.5f)); <- this will clear the whole screen, not just the area of the viewport, to restrict the area cleared use glScissor
or do like I said in my other post use a small window
Quote:
Are you calling glBindBuffer(0)
glBindTexture(0)
glDisableClientArray(...)
and shit like that?
Nope. Just clearing, setting vertex parameters and glDrawElements. Truly, I do nothing more.
Quote:
I also wouldn't be surprised if SDL managed to initialise a less-than-optimal OpenGL context - it is a pretty ancient code base, unless you are using the 1.3 branch.
Just checked that - it's not SDL's fault.
Quote:
The more I read of this, the more strongly I suspect Cg as the source of this problem. I'm not that familiar with Cg; could you simply have it return the shader program name after it's been linked so that you can use strictly GL calls for binding programs/attributes/uniforms as needed? It just seems like Cg is doing a lot of bounds checking and maybe name lookups that shouldn't be necessary on a per-frame basis (or at least, that's my guess as to what's going on).
I'd also blame Cg for now. I'll check this tomorrow because it will require reorganizing code a bit. And as for doing additional work made by Cg, it's probably not making more gl calls since glIntercept would catch that. Maybe it's doing some "global" code but I doubt that would make me CPU-bound in some form.
Quote:
CRenderer::setViewport(50,50,100,100);
CRenderer::clear(true, true, false, CVector3(0.5f, 0.5f, 0.5f)); <- this will clear the whole screen, not just the area of the viewport, to restrict the area cleared use glScissor
or do like I said in my other post use a small window
Mhm, right. I've missed that. Thanks.
After the change FPS grew up from 120 to 140 FPS for OGL. For D3D9 it doesn't have any impact of course. But after upgrading my drivers D3D9 is now at 200-220 FPS (drop from 350 FPS, is this normal...?)
I've just realized how "fragile" this whole situation is. By accident I reverted my drivers to some older than I had before. Now I have some from year 2007 so they're really old. And I found out that on these old drivers when application (one of my test ones, other one than the one we were concering throughout this whole thread) is calling all these glGetProgramiv, the FPS is 45. After getting rid off them I have 115 FPS. That's interesting, but not as much as this: after upgrading drivers to the totally newest ones the FPS was 80! So on drivers from 2007 I have 115 FPS, and on those from 2009 I have 80 FPS. I'm now wondering how authoritative all of my tests are. I mean, the only configurations I've been testing are my notebook with GF8400 (not such an old GPU, but it's notebook, and as we know they're usually pretty badly supported with respect to desktops) and desktop with GF6600 (for today it's quite ancient GPU). So I think that maybe the drivers are no just optimized for them and to get reliable results I should check some GF8k working on desktop.
i_luv_cplusplus showed his results for my test application - 650 FPS for D3D and 560 FPS for OGL. Maybe these values would be even more "accurate" if they were lower, at some greater GPU abuse.
i_luv_cplusplus showed his results for my test application - 650 FPS for D3D and 560 FPS for OGL. Maybe these values would be even more "accurate" if they were lower, at some greater GPU abuse.
I'm sorry for writing third post one after another (this is starting to look like I'm doing a blog here :)), but I want to keep you up to date to what I'm doing.
So, I decided to once again give a try to new stable nivdia notebook drivers. And now I noticed, that D3D9's performance has increased... in some way.
Again: I continously test two applications: the one with geometry abuse which I'm testing throughtout this whole thread (let's call it performance test), and my game (let's consider it as medium-sized application, something that can reliably be tested against real performance).
These are FPSes I got on *old* drivers (in some fixed scenario of the game - to keep various FPSes "compatible"):
my game, D3D9 - 145 FPS
my game, OGL - 115 FPS
performance test, D3D9 - 350 FPS
performance test, OGL - 140 FPS
Now FPSes on *new* drivers:
my game, D3D9 - 155 FPS
my game, OGL - 85 FPS
performance test, D3D9 - 210 FPS
performance test, OGL - 140 FPS
So in my game D3D9 is now working a bit better, but performance of OGL has decreased a lot. And in case of performance test D3D9's performance dropped, and OGL's remains the same.
I've also decided to give a try to AMD's CodeAnalyst (I have AMD Turion 2x 1.8Ghz). And the results are *very* interesting:
http://maxest.fm.interia.pl/gct-game-profile.JPG
http://maxest.fm.interia.pl/performance-test-profile.JPG
The first row shows list of processes that are sampled by CodeCatalyst *within the application*. The second shows all processess that work *within the system*. The first row shows nothing particularly interesting - results are similar for both versions (D3D9 and OGL). But in second row you can see that CPU spends twice more time in OGL's version of my game than in D3D9's one. However, this seems to be nothing when compared to my very simple performance test application - in this case CPU spends 3.5% of time in D3D9's version, and 89% in OGL's! At this time I'm even not sure whether I correctly interpret these values. I hope someone more experienced in profling than me could contribute here and say something about these results. I really would like to know what's going on.
So, I decided to once again give a try to new stable nivdia notebook drivers. And now I noticed, that D3D9's performance has increased... in some way.
Again: I continously test two applications: the one with geometry abuse which I'm testing throughtout this whole thread (let's call it performance test), and my game (let's consider it as medium-sized application, something that can reliably be tested against real performance).
These are FPSes I got on *old* drivers (in some fixed scenario of the game - to keep various FPSes "compatible"):
my game, D3D9 - 145 FPS
my game, OGL - 115 FPS
performance test, D3D9 - 350 FPS
performance test, OGL - 140 FPS
Now FPSes on *new* drivers:
my game, D3D9 - 155 FPS
my game, OGL - 85 FPS
performance test, D3D9 - 210 FPS
performance test, OGL - 140 FPS
So in my game D3D9 is now working a bit better, but performance of OGL has decreased a lot. And in case of performance test D3D9's performance dropped, and OGL's remains the same.
I've also decided to give a try to AMD's CodeAnalyst (I have AMD Turion 2x 1.8Ghz). And the results are *very* interesting:
http://maxest.fm.interia.pl/gct-game-profile.JPG
http://maxest.fm.interia.pl/performance-test-profile.JPG
The first row shows list of processes that are sampled by CodeCatalyst *within the application*. The second shows all processess that work *within the system*. The first row shows nothing particularly interesting - results are similar for both versions (D3D9 and OGL). But in second row you can see that CPU spends twice more time in OGL's version of my game than in D3D9's one. However, this seems to be nothing when compared to my very simple performance test application - in this case CPU spends 3.5% of time in D3D9's version, and 89% in OGL's! At this time I'm even not sure whether I correctly interpret these values. I hope someone more experienced in profling than me could contribute here and say something about these results. I really would like to know what's going on.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement