Sign in to follow this  

ATI / NVIDIA Performance Issues

This topic is 4531 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi everyone, I recently upgraded my graphics card from an ATI Radeon 9800 Pro to a Geforce 6800 GT, and I have suddenly taken a big performance hit in my game. I was initially getting about 300 FPS with my ATI card, and now I am getting 90 - 100 FPS with the NVIDIA one. Basically I'm doing many of the optimisation techniques in DirectX, I am writing all my game objects to a dynamic buffer created with D3DUSAGE_DYNAMIC and D3DUSAGE_WRITEONLY, sorted by texture, so there is basically one vertex buffer going to Direct 3D for each texture in the game world. I am locking the Vertex Buffer with D3DLOCK_DISCARD as well. I'm also drawing everything as an Indexed Vertex Buffer. I perform frustum culling on the world objects to reduce some of them, and rendering front to back won't really make much difference, as I am making a space sim game. My method of batching by texture rather than distance from the player should give a much better speed increase. I'm not doing anything complicated with my rendering system yet. I'm using standard DirectX 9, not fiddling around with any Vertex or Pixel shaders. I basically just use textures, lighting and thats about it. So basically my question is, does anyone have any idea why there should be such a big difference between ATI and NVIDIA cards? In my experience, speed differences are usually minimal between similar powered cards, so why a faster card should render slower is a bit beyond me. Thanks for any help in advance, - James

Share this post


Link to post
Share on other sites
If I may ask, why are you writing all of your game objects to a dynamic buffer? You should be using static vertex buffers for the vast majority of rendering operations.

Maybe the ATI drivers handle large amounts of AGP transfers necessitated by using dynamic vertex buffers better than the NVidia drivers...but the point may be moot since you should not be using all dynamic vertex buffers.

Share this post


Link to post
Share on other sites
Hello weasalmongler,

I can only give you one possibility.

Do you do any mesh skinning? If so do you use indexed skinning?

If your program is setup to use indexed skinning similar to SkinnedMesh or MView from the DirectX SDK, then it's possible you are being switched to a software mode.

The GeForce cards do not support the hardware matrix palette that is required for indexed skinning. In the SkinnedMesh and MView they detect the card's inadequacy and switch to a software pallet which is far slower.

That's the only large difference that I am currently aware of.

Good Luck,

Share this post


Link to post
Share on other sites
The reason I am not using static buffers is that it would require 1 call to DrawIndexedPrimative per game object. 100's of calls to DrawIndexedPrimitive per frame is a huge bottleneck as it is what I initially used. When I switched to my new method I got about 70-100 FPS increase, as it effectively reduces my engine to 1 DrawIndexedPrimative per texture in the game world rather than per object.

I have also tried it without setting them to dynamic and don't see much difference either way, there are so few calls now that I don't suppose it matters.

I don't believe that I do any mesh skinning, I will look it up and see what I can find.

Thanks for the help so far, does anyone else have any other idea's?

- James

Share this post


Link to post
Share on other sites
Quote:
Original post by weasalmongler
The reason I am not using static buffers is that it would require 1 call to DrawIndexedPrimative per game object.
- James


Why not just put all of the vertex data into a single static buffer? That would still be better than copying every frame.

Share this post


Link to post
Share on other sites
Quote:
Original post by ganchmaster

Why not just put all of the vertex data into a single static buffer? That would still be better than copying every frame.



That would mean he cannot cull the triangles...if i understood that right than he culls the triangle and adds every triangle that isn't culled to his dynamic vertexbuffer.

regards,
m4gnus

Share this post


Link to post
Share on other sites
First, you're absolutely positive that this isn't a v-sync thing? I know, you are, but I have to make sure :)

Quote:
Original post by weasalmongler
The reason I am not using static buffers is that it would require 1 call to DrawIndexedPrimative per game object.

What you're looking at here is an instancing problem. There are several ways to solve it, a few of which involve dynamic vertex buffers as you have noted. There's a great article in GPU Gems 2 about it, but even if you don't have that, DirectX has some built-in instancing stuff that you should check out. There's probably lots of material to be found in Google as well.

I'm not convinced that this is the cause of your problems though... we'd probably need more information to make a better guess.

Share this post


Link to post
Share on other sites
If I put them into a static buffer then I am unable to move them around. I would have to do one call to DrawIndexedPrimitive as I would have to change the modelview matrix. By performing the transformations myself on the CPU, I eliminate this problem.

I am aware that this way I do have to copy the data per object, but from the tests I ran, this is much better than the overhead of 300 or so calls to drawindexedprimitive and a lot of modelview matrix state changing.

Yeah, don't worry about v-sync, that is switched off.

What sort of information do you think would help? I know that it isn't fill rate based, I've shrunk the window down to a tiny size and still runs at the same speed.

Thanks again,

-James

Share this post


Link to post
Share on other sites
OK, I ran a profiling tool on my application, and it appears that 50.3% of my programs time is spent in something called "WaitForMultipleObjects". Does this mean anything to anyone? I have a feeling it lies in Kernel32.dll as over 50% of the programs time was spent in there somewhere. Has anyone seen anything like this before?

- James

Share this post


Link to post
Share on other sites
Actually... I've just looked at it a bit closer and Create Vertex Buffer under ATI takes 577.1 while under NVIDIA takes 448240.2!! That's maybe 800 times longer!! Why would that be? How can that be so expensive under NVIDIA? Also on top of that locking the buffers takes about twice as long.

Does anyone know if this is fixable or just due to the difference of the cards?

- J

Share this post


Link to post
Share on other sites
Quote:
Original post by weasalmongler
If I put them into a static buffer then I am unable to move them around. I would have to do one call to DrawIndexedPrimitive as I would have to change the modelview matrix. By performing the transformations myself on the CPU, I eliminate this problem.


I see. Like AndyTX said, it sounds like you need to look into instancing techniques, if you really want to draw a large number of independently moving objects with a single DrawPrim call. With the right instancing scheme, you should theoretically be able to make a good optimization by saving a lot of the overhead of the dynamic buffer.

But also, even if you are using a dynamic buffer, why are you calling CreateVertexBuffer every frame? You can create it once and fill it over and over. Maybe that is the source of your problems?

Share this post


Link to post
Share on other sites
You should never create a vb every frame. As Brian Fellows would say 'THAT's CRAZY!!'. ;)

Using a dynamic buffer itself shouldn't be a big hit, but creating index and/or vertex buffers each frame is a big hit - and there are no guarantees how long this might take.

Let me guess - you aren't sure how big the buffer needs to be b/c you don't know how many items will be in view?

If so, treat your dynamicvb as a staging area, and keep lock(), write, unlock() for each object, and then draw() when it's full or you need to change texture, or the frame is over. When you draw b/c it's full, you need to re-lock with discard and keep going.

Share this post


Link to post
Share on other sites
I just rewrote the way the program allocates the dynamic vertex buffer, so now we create a large one at the start of the program and deallocate at the end. Any time inbetween it is simply locked and unlocked to write data to it.

This has given me a huge performance enhancement. It has gone from pretty much a constant 70 - 100 FPS to an amazing 150 - 400 FPS with literally loads of enemies and other ships all over the place. I still have a bit of optimising to do but I think I can sort it now that I have established the cause of the problem.

Many thanks to everyone who helped with this issue.

- James

Share this post


Link to post
Share on other sites

This topic is 4531 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this