90 calls drawprimitive@3 verts each, == 33 fps on GF3! 49 on GF4ti 4200

Started by
15 comments, last by Wicked Ewok 20 years, 1 month ago
In a sense, your tests are so outside the norm that it's difficult to see if they really prove anything at all other than; when used badly, the driver will behave badly.

I don't have any great answer for why re-rendering a VB would be costlier than rendering several different VBs, but I wouldn't be too surprised if it has something to do with the fact that the driver is making some assumptions that you'll never do what your test does. For instance, you would usually render a much larger buffer and then a different much larger buffer. Or, you might render the same buffer twice, but with different transformations. For instance, some of my 2D stuff that re-renders the same 4 vertices with different matrices is about an order of magnitude faster than your numbers on a GF2MX. That particular app is more fill limited than anything else.

Basically, if I'm reading it correctly, you are creating a test for a situation that will never happen. One could argue that a driver tuned for your test would do poorly on a real task.

nVidia has presentations about efficient buffer usage. The answer might be there and/or ask nVidia directly and post your findings here.

Author, "Real Time Rendering Tricks and Techniques in DirectX", "Focus on Curves and Surfaces", A third book on advanced lighting and materials

[edited by - crazedgenius on February 23, 2004 2:29:43 AM]
Author, "Real Time Rendering Tricks and Techniques in DirectX", "Focus on Curves and Surfaces", A third book on advanced lighting and materials
Advertisement
If your triangles are "big" on screen, then it''s very likely to be fillrate. You don''t even need fullscreen triangles to reach the fillrate limits of your card. In addition if your Z test function was poorly chosen, rendering the same triangle over and over could have resulted in clearing the ZBuffer values every time, too.

Y.
How many times does your set texture get called... it''s a slow call. You have a settexture in your outer loop, and a call to something that implies it will set even more textures in the inner loop. You''re also setting render states 90 times... which is bad.

Using one VB 90 times isn''t a problem. It''s certainly better than using 90 VBs once each.

Outside the code you gave, are you locking and filling the VB each frame? ie: Are you doing lock, draw * 90, lock, draw * 90, etc?

Do you have any stats on how it performs on an ATI card? It might be something odd due to nVidia''s driver.
I don''t think this has been mentioned, but I think your interpretation of your profiling results is wrong. Just because it spends a lot of time in the display driver doesn''t mean anything; a game spends a lot of time rendering, and the display driver could be taking so much time because it''s blocking when you call Present(), waiting for the card to finish rendering.
One thing I am doing wrong but just for the sake of testing was setting the Z comparison function to its the default lessequal. I use the same set of vertices over and over, even if they are in a list of 90 vbs( actually, I have a VB limit and depending on how many vertices this is, it goes into about 20 actual vbs ). Profiling with the latest nvidia beta developer drivers( just installed them now ), I get most of my time consumption on the Hal driver, and just a little less on the nv4 display driver.

I am going to release my code for my rendering system. Been working on it for a few weeks now, but I just can't get the triangle rates that my GF3 should be able to pump up, especially with just one texture. If anyone wants to help, it's here:

http://www.mojo9.com/~wickedewok/DXRenderer_source.zip

Use what you want from it too, it's supposed to batch certain VB elements during rendering( though not entirely finished ). I think it's well commented. The main driver function is the only really messy section I think.

I have a UML diagram of it somewhere but I can't find it right now. If anyone would be willing to play around with this for me, I would be most thankful. Thanks.

-Marv


[edited by - Wicked Ewok on February 23, 2004 3:33:20 PM]
-=~''^''~=-.,_Wicked_,.-=~''^''~=-
I''m also curious how much of the screen that the primitive was taking up. Do you have a percentage? Also, are you culling the backside or is it drawing both sides? Also, what are you setting your width and height to when you create the device? A high resolution could certainly be a problem.

Chris
Chris ByersMicrosoft DirectX MVP - 2005
Hmm thanks supernat, I think you got what part of the problem was. The backbuffer''s width and height was bigger than what I set the device to. I rescaled it to 600x800 and enabled zculling func to be LESS instead of LESSEQUAL and it works a lot better now. I''ve gotten a throughput of around 900,000 triangles/second(triLIST) with every triangle actually on the screen and covering at least a total of 3/4th the whole screen. this rate of course drops when the zfunc is defualt at LESSEQUAL since I''m drawing fixed depth triangles. Tristrips give me a lot more throughput. This is all with about 2 static VBs with 32k vertices each, and 80 drawprimitive calls to many ''801 vertex'' sections of each VB. seems to have fixed it..cool, thanks.

-Marv
-=~''^''~=-.,_Wicked_,.-=~''^''~=-

This topic is closed to new replies.

Advertisement