ATI / NVIDIA Performance Issues

Started by
11 comments, last by weasalmongler 18 years, 9 months ago
Hi everyone, I recently upgraded my graphics card from an ATI Radeon 9800 Pro to a Geforce 6800 GT, and I have suddenly taken a big performance hit in my game. I was initially getting about 300 FPS with my ATI card, and now I am getting 90 - 100 FPS with the NVIDIA one. Basically I'm doing many of the optimisation techniques in DirectX, I am writing all my game objects to a dynamic buffer created with D3DUSAGE_DYNAMIC and D3DUSAGE_WRITEONLY, sorted by texture, so there is basically one vertex buffer going to Direct 3D for each texture in the game world. I am locking the Vertex Buffer with D3DLOCK_DISCARD as well. I'm also drawing everything as an Indexed Vertex Buffer. I perform frustum culling on the world objects to reduce some of them, and rendering front to back won't really make much difference, as I am making a space sim game. My method of batching by texture rather than distance from the player should give a much better speed increase. I'm not doing anything complicated with my rendering system yet. I'm using standard DirectX 9, not fiddling around with any Vertex or Pixel shaders. I basically just use textures, lighting and thats about it. So basically my question is, does anyone have any idea why there should be such a big difference between ATI and NVIDIA cards? In my experience, speed differences are usually minimal between similar powered cards, so why a faster card should render slower is a bit beyond me. Thanks for any help in advance, - James
Advertisement
If I may ask, why are you writing all of your game objects to a dynamic buffer? You should be using static vertex buffers for the vast majority of rendering operations.

Maybe the ATI drivers handle large amounts of AGP transfers necessitated by using dynamic vertex buffers better than the NVidia drivers...but the point may be moot since you should not be using all dynamic vertex buffers.
Hello weasalmongler,

I can only give you one possibility.

Do you do any mesh skinning? If so do you use indexed skinning?

If your program is setup to use indexed skinning similar to SkinnedMesh or MView from the DirectX SDK, then it's possible you are being switched to a software mode.

The GeForce cards do not support the hardware matrix palette that is required for indexed skinning. In the SkinnedMesh and MView they detect the card's inadequacy and switch to a software pallet which is far slower.

That's the only large difference that I am currently aware of.

Good Luck,
-------------------------------------------------------------------Life is short so go on and live it, cause the chicks dig it.- Kahsm
The reason I am not using static buffers is that it would require 1 call to DrawIndexedPrimative per game object. 100's of calls to DrawIndexedPrimitive per frame is a huge bottleneck as it is what I initially used. When I switched to my new method I got about 70-100 FPS increase, as it effectively reduces my engine to 1 DrawIndexedPrimative per texture in the game world rather than per object.

I have also tried it without setting them to dynamic and don't see much difference either way, there are so few calls now that I don't suppose it matters.

I don't believe that I do any mesh skinning, I will look it up and see what I can find.

Thanks for the help so far, does anyone else have any other idea's?

- James
Quote:Original post by weasalmongler
The reason I am not using static buffers is that it would require 1 call to DrawIndexedPrimative per game object.
- James


Why not just put all of the vertex data into a single static buffer? That would still be better than copying every frame.
Quote:Original post by ganchmaster

Why not just put all of the vertex data into a single static buffer? That would still be better than copying every frame.



That would mean he cannot cull the triangles...if i understood that right than he culls the triangle and adds every triangle that isn't culled to his dynamic vertexbuffer.

regards,
m4gnus
"There are 10 types of people in the world... those who understand binary and those who don't."
First, you're absolutely positive that this isn't a v-sync thing? I know, you are, but I have to make sure :)

Quote:Original post by weasalmongler
The reason I am not using static buffers is that it would require 1 call to DrawIndexedPrimative per game object.

What you're looking at here is an instancing problem. There are several ways to solve it, a few of which involve dynamic vertex buffers as you have noted. There's a great article in GPU Gems 2 about it, but even if you don't have that, DirectX has some built-in instancing stuff that you should check out. There's probably lots of material to be found in Google as well.

I'm not convinced that this is the cause of your problems though... we'd probably need more information to make a better guess.

If I put them into a static buffer then I am unable to move them around. I would have to do one call to DrawIndexedPrimitive as I would have to change the modelview matrix. By performing the transformations myself on the CPU, I eliminate this problem.

I am aware that this way I do have to copy the data per object, but from the tests I ran, this is much better than the overhead of 300 or so calls to drawindexedprimitive and a lot of modelview matrix state changing.

Yeah, don't worry about v-sync, that is switched off.

What sort of information do you think would help? I know that it isn't fill rate based, I've shrunk the window down to a tiny size and still runs at the same speed.

Thanks again,

-James
OK, I ran a profiling tool on my application, and it appears that 50.3% of my programs time is spent in something called "WaitForMultipleObjects". Does this mean anything to anyone? I have a feeling it lies in Kernel32.dll as over 50% of the programs time was spent in there somewhere. Has anyone seen anything like this before?

- James
Actually... I've just looked at it a bit closer and Create Vertex Buffer under ATI takes 577.1 while under NVIDIA takes 448240.2!! That's maybe 800 times longer!! Why would that be? How can that be so expensive under NVIDIA? Also on top of that locking the buffers takes about twice as long.

Does anyone know if this is fixable or just due to the difference of the cards?

- J

This topic is closed to new replies.

Advertisement