Archived

This topic is now archived and is closed to further replies.

VladR

How can ONE triangle cause the fps to drop by half (150-75) ?

Recommended Posts

Hi, While optimizing my game to run on different gfx cards, I have noticed a really weird performance drop when I plugged into my rig nVidia TNT1 8 MB card. After switching off almost everything I was left off only with floor that is being rendered like NonIndexed triangle List from one Vertex Buffer. Without floor, the fps is 150, with floor (264 triangles) it falls down to 75 fps. After experimenting with different triangle counts I found the threshold: It is triangle Number 36. If there are just 36 or less triangles, fps is 150 fps, if there are 37 or more, the fps falls to 75 fps - what is also strange is that it is actually the half! It shouldn`t be a fillrate problem since the fact whether i set the camera to show all 36 triangles or just half of them doesn`t change the fps at all (even not for 0.1 fps). But when I add the 37th, it falls to 75 fps (from 150) - i.e. exactly down to half. Here is the render code:
        
	fm_pd3dDevice->SetStreamSource( 0, BigFloorVB_Handle, sizeof(FLOORCUSTOMVERTEX) );
	fm_pd3dDevice->SetVertexShader( D3DFVF_FLOORCUSTOMVERTEX );
	int rows = 4;	
	int cols = 4;
	CGrid * G = &Grid [0];
	for (int cRow = 0; cRow < rows; cRow++)
			fm_pd3dDevice->DrawPrimitive( D3DPT_TRIANGLELIST, cRow*G->Dimension_Columns*2*3, cols*2); 
	fm_pd3dDevice->DrawPrimitive( D3DPT_TRIANGLELIST, rows*G->Dimension_Columns*2*3, 2*2+1); 
        
As you can see, there are no state changes between DrawPrimitive calls. Vertex Buffer for floor is locked just once during startup with following flags: D3DUSAGE_WRITEONLY, D3DFVF_FLOORCUSTOMVERTEX, D3DPOOL_DEFAULT and never locked again during rendering. Also please note that whether I draw the triangles with one call to DrawPrimitive (it is in current version, since it is faster to send ~200 triangles to render than to play with their visibility) or with several of them (which is just for debug purposes of course), it doesn`t simply make any difference. I also can`t do it in Windowed mode (which would help in debugging - but unfortunately DirectX Debug Window didn`t show anything when rendering that last triangle ), it shows only in FullScreen Mode. I will of course use Indexed triangle lists (or strips in case of floor) in final build. But what on Earth causes that halving ? And I just noticed that this 150 fps is exactly my monitor refresh rate. Why does it ignore my DirectX settings of VSync (off) in Display Properties and aTuner ? I haven`t noticed any variable for window creation related to VSync. (Though I`m gonna search these boards for Refresh stuff - it gets asked a lot I think.) BTW, the rig is : AMD XP2100, 256 MB, gfx (TNT1-GF2MX, GF2GTS,GF3,GF4MX), Debug DX 8.1, Dets 30.00 EDIT : If I render 1600 additional triangles of my low-poly characters, the fps stays still at those magic 75 fps (as is the fps with only 37 floor triangles). EDIT2: I plugged GF2MX this time to see what is the situation there. She can naturally render much higher amount of tris with no problems at all (compared to TNT1). Now GF2MX renders whole floor at full 150 fps (gotta search the docs for those presentation parameters - thanks Anon ). What is however strange is that it stops at round fps : Either 150.0, or 75.0 or 50.0 fps. There is obviously some correlation between fps and VSync not depending on amount of tris - because I get here those 75.0 fps whether I render 10.000 tris or 4.000 tris per frame (it changes from one floor quad walking onto another) - still on GF2MX. Drivers seem to play it their way here not caring for 10k or 5k polys ... Any idea is appreciated... VladR Avenger game [edited by - VladR on May 26, 2003 4:59:44 PM]

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
The variable controlling if you have vsync is in your D3D presentation parameters. It''s called FullScreenPresentationInterval. You want to use ''immediate'' rather than ''one''.

Too lazy to look up an exact block of code for you. Hope it helps.

Share this post


Link to post
Share on other sites
quote:
Too lazy to look up an exact block of code for you. Hope it helps.
Sure it helped a lot. Spoonfeding always helps
Now I can watch my GF2MX range from 50 fps to 273 fps which is sort of funny. I wonder what it shall be like with TNT1 (shall try this evening, but now I gotta go sleep since I`m getting up in few hours to daily work).


VladR
Avenger game

Share this post


Link to post
Share on other sites
Now that was a link ! I was reading it and slapping myself all the time. How come I didn`t realize this issue sooner ?

Unfortunately this article can`t explain, why do I have such a difference in a FRAME TIME when rendering the very one 37th triangle.

I still think it has to be something in drivers, some batching stuff - i.e. no matter how many triangles within a given boundary I render, that particular frame shall still take for example 1 ms (with 37 OR even 50 tris rendered).

EDIT : That Frame Time Linearity could explain why am I getting exactly 75.0 or 50.0 or 25.0 fps, right ?

Do the drivers work that way ? Or am I asking the wrong question ?

VladR
Avenger game

[edited by - VladR on May 26, 2003 7:12:12 PM]

Share this post


Link to post
Share on other sites
when computing FRAME TIMES you must compute the time BEFORE you call the flip, because once you include the time waiting to flip, then of course both FPS and FRAME TIME can only go in intervals of the refreash rate ... ie 150, 75, 52, 37, 25 ...

If you compute the time before the flip, then the diff between 36 and 37 should be reasonable ... please check and post the numbers if they are not ... as then there is a real difference besides just "it crossed the threashold" ...

Share this post


Link to post
Share on other sites
quote:
FILLRATE. Drawing a huge tri on an old card is a big issue.
No, it`s not in fillrate. And those tris are not huge, all those triangles have exactly same size and from gameplay camera position you usually see 130 triangles AND the fps stays the same (75 fps) whether half of screen is black and the other half full of tris or just 37 tris. So fillrate is not reason.
quote:
when computing FRAME TIMES you must compute the time BEFORE you call the flip, because once you include the time waiting to flip, then of course both FPS and FRAME TIME can only go in intervals of the refreash rate ... ie 150, 75, 52, 37, 25 ...
Hm, could you please explain this a little bit more ? What does it have to do with round fps numbers (with VSync finally OFF in DirectX) which is either 150.0 or 75.0 (nothing between them - i.e. no 123,45 fps or such). And it probably has to do something with the HW since when I plugged GF2MX (after my last night`s post) I got fps in range from 40.0 - 300.00 and it was anywhere in that boundary not just clean 150 or 75 or 50.
I still find it hard to believe that TNT1 HW worked that way. Did anybody notice same issue when testing their game on different HW?




VladR
Avenger game

Share this post


Link to post
Share on other sites
This may well be a driver limitation, but if you think about it, there will always be some minor addition to the loop that causes the frame time to exceed the retrace time, and suddenly the FPS is halved (assuming vsync is on).

This addition may not even be graphic related code - there may be one-too-many math operations in the loop that the addition of just one more operation could mean the retrace has been missed.

Okay, a slightly exaggerated example, but you get the idea!

Share this post


Link to post
Share on other sites
quote:
when computing FRAME TIMES you must compute the time BEFORE you call the flip, because once you include the time waiting to flip, then of course both FPS and FRAME TIME can only go in intervals of the refreash rate ... ie 150, 75, 52, 37, 25 ...

If you compute the time before the flip, then the diff between 36 and 37 should be reasonable ... please check and post the numbers if they are not ... as then there is a real difference besides just "it crossed the threashold" ...


In most cases, profiling individual API calls is not useful. The CPU and GPU operate in parallel and DrawPrimitive will return well before the triangles are actually rendered by the hardware. Typically you want to profile the entire frame time, and over the course of many frames (because the driver is allowed to queue up as many as 3 frames of rendering commands).

As another poster said, use D3DPRESENT_INTERVAL_IMMEDIATE to take the refresh rate out of the equation.

Share this post


Link to post
Share on other sites
quote:
This addition may not even be graphic related code - there may be one-too-many math operations in the loop that the addition of just one more operation could mean the retrace has been missed.

Okay, a slightly exaggerated example, but you get the idea!
Far from it. I got the idea, good to remind me the obvious fact that frame time is not influenced by just API calls, but also by my math/processing operations. Thanks
quote:
because the driver is allowed to queue up as many as 3 frames of rendering commands
Where in the world is one supposed to find such info regarding the way that drivers work ? Haven`t noticed that on nVidia Developer`s page. Or did you communicate directly with nVidia folks ? Either way thanks.


VladR
Avenger game

Share this post


Link to post
Share on other sites
quote:
Where in the world is one supposed to find such info regarding the way that drivers work ?


I found THIS info (queueing of three pages) in the DIrectX doucmentation, clearly labelled. Have you ever read it from the beginning to the end? There is stuff like this in the API description.

Share this post


Link to post
Share on other sites
quote:
Have you ever read it from the beginning to the end? There is stuff like this in the API description.
I always read just separate chapters when dealing with some stuff. But no, I never read it from beginning to end - shall have to do it though since it seems I missed some really neat stuff.



VladR
Avenger game

Share this post


Link to post
Share on other sites
quote:
I always read just separate chapters when dealing with some stuff.


I always read the complete documentation front to back ONCE. The difference? I know the API. You dont.

After going through it once, I normally know where to look for details when I need them :-)

Share this post


Link to post
Share on other sites
quote:
I always read the complete documentation front to back ONCE.
I`m curious. Have you actually printed it all those 400 pages ? Because EVERYTIME I have opened Microsoft Visual C++ AND Microsoft Word with DX docs, the system becomes unstable and crashes due to low system resources therefore not displaying fonts, menus and stuff like that. Naturally you can`t save anything in C++ then. Just reset PC. Even 3dsMax with tens-of-thousands triangles model can be run in another window. Just don`t want to have whole docs open.
That`s why I divided it into separate parts - which means you have to keep opening file after file if you`d like to read it all. Which I`ll do of course, since I have apparently missed some stuff during my first (and only) complete reading of dx docs about a year and half ago which obviously was of little meaning since I knew nothing about dx at that time.
Time to read it again, oh well...

quote:
The difference? I know the API. You dont.
Never did I state that i know the API completely. I am progressing gradually and so far have a game in a pretty finished state. I`ll upload the new demo probably this weekend with Low-detail option for slower PCs plus many new smaller finished details.

BTW, Thona, I`m curious, what does your engine`s outcome look like currently with your API knowledge ?



VladR
Avenger game

Share this post


Link to post
Share on other sites