Archived

This topic is now archived and is closed to further replies.

Performance and Overhead

This topic is 5568 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I’ve been doing some terrain rendering and trying to get the performance to improve. I was getting only about 150 fps drawing 2000-2500 polygons (2.2 GHz with Geforce 2 Pro, 640x480). That seems slow compared to other things I’ve read here. Since the general advice is to put all your vertices in one vertex buffer, I tried that but the speed didn’t change. Then I commented out my DrawIndexedPrimitive() function and the speed only increased to 320 fps. I was surprised because there’s not a whole lot of software going on otherwise. I reduced the processing so that I was just doing fps calculation, Clear (the buffer), Begin/EndScene, and Present. The speed still stayed at 320 fps. I commented out Begin/EndScene, and Present. Then the speed rose to 500 fps. I commented out the Clear function and the speed rose to 700 fps. Does this make sense? Does the Clear function really take that much time? Do Begin/EndScene and Present really take that much time even when there is no rendering to do? What Am I doing wrong? Even 700 fps seems slow for a program that’s doing nothing except calculating fps and displaying it in the title bar. Edit: Forgot to mention I'm using DirectX 8.1 [edited by - JimH on September 11, 2002 6:31:24 PM]

Share this post


Link to post
Share on other sites
Make sure you are using QueryPerformanceCounter to calculate the FPS. The other timer functions are less than accurate (and basically don''t work under 5 milliseconds) so you could be hitting some flaw where the timers are getting funky results.

----

I''m getting 3700 fps with a basic engine (includes gui processing (no gui being rendered), input, rendering of text, and all the begin/end scene/presents/clears)

On a p4-1700 with GeForce3 in 800x600.

Share this post


Link to post
Share on other sites
1) Is this in windowed mode? And is your present VSYNC''d ? Try performing your tests in windowed mode without any VSYNC. The figures you''ve posted may be a multiple of your monitors refresh rate multiplied by the number of frames which some drivers will (naughtily) buffer up for later display (WHQL drivers only allow them to do 1.5 and present an option if they want to do more).

An example of what I mean above: Monitor VSYNC (refresh) at 70Hz * 10frames buffered would = 700fps output.


2) Also take a close look at your message pump - if you stop pumping messages for a while, Windows often assumes it''s ok to start doing housekeeping such as reorganising the page file, running the index service etc. In game you usually want it to think that you''re never idle.


3) Check what the SDK samples get. If they get similar, then it''s likely IMO to be #1. If the video card drivers are capping the frame rate (sometimes they do it even if you don''t ask for VSYNC!), then try adding some more polygons. You shouldn''t be worried until you go below the monitor refresh rate!!!. Try stressing your code to see how many extra polys it takes to get the frame rate to go DOWN. (Up doesn''t matter and isn''t worth profiling - above monitor refresh the figure you get is meaningless!).


4) Ensure your hardware is functioning correctly. Run DXDIAG and perform all the tests. In particular you need to check that AGP is working as expected. Also download BENMARK5 from the nVidia website and check that you get the (roughly) correct figures for your card. If either of these show below expected results, download new motherboard and AGP, and CPU drivers. They can make a HUGE difference.

--
Simon O''Connor
Creative Asylum Ltd
www.creative-asylum.com

Share this post


Link to post
Share on other sites
Joey, I had been using timeGetTime(). I switched to GetPerformanceCount() but it was the same. Thanks for that anyway; I’ll use GetPerformanceCount() from now on.

Simon, let me take it out of order:

1) Yes it is in windowed mode with no VSYNC.

4) I ran DXDIAG and everything looks ok. I downloaded BENMARK5 and ran the “ribbons” test. I got 23 million triangles/sec on my GF2 Pro, which claims 25 million. So that sounds fine.

3) Hard to get a great idea here since I don’t know what the SDK sample should run at. The progressive mesh sample ranges from 610 to 660 fps depending on the number of vertices. (minimum is 4 vertices) That is in windowed mode at 640x480x32. By covering up most of the window with another application, I can get that up to 1000fps, which means that the copy-to-window is taking some big chunk of the time. Doing the same thing in my program didn’t yield such an increase.

But the fact that this SDK sample can draw a mesh with 4 vertices plus put some text on the screen faster than my program can do nothing but clear-beginscene-endscene-present means I need to look at the sample more and see what the differences are.

The reason I started worrying about fps above the monitor refresh rate is because I wanted to identify any performance problem not related to the graphics rendering. So that’s when I commented out the rendering to see the base game loop speed. (I don’t have any profiling software. If there is any free profiling software I can use, let me know.)

2) OK, here’s the part where I confess something I hadn’t mentioned. I wanted to work with some user interface elements, so I used MFC for the Windows shell part of the program. (I didn’t want people jumping to the conclusion that MFC as the problem without even knowing if it was) I use the CWinAPP::OnIdle() function to do the work in. I didn’t expect that this would slow anything down if I am not doing anything with the UI but letting the program run. But now I think I’ll probably have to consider that it might be. I’ll have to stick the game into a minimal Windows shell program with a message loop and see what happens.

I also considered that perhaps the main game loop should run in its own thread. I don’t know if people do it this way or not. Though I don’t expect it would be any faster if you still have the Windows message loop going anyway.

-----

For some reason, my program is faster today even though I didn’t change anything. Today I’m getting 410 fps with only clear-beginscene-endscene-present with no geometry being rendered. Then when I comment those 4 functions out I get 900 fps. Now 900 is starting to get closer to what I’d expect, but I don’t think those 4 functions should make a difference of 500 fps with no rendering.

So it does seem like there are problems in two places. One could be the message loop, because even without clear-beginscene-endscene-present and only the fps calculation I’m only getting 900 fps (on a 2.2 GHz machine). The other problem is that those 4 functions seem to take up too much overhead. Maybe this has something to do with some DirectX setting I’m making. I’ll have to look at SDK samples more closely.

Sorry for such a long post.

[edited by - JimH on September 13, 2002 2:35:37 PM]

Share this post


Link to post
Share on other sites
How may I ask are you displaying your FPS? Perhaps its the displaying of the FPS that's causing the overhead you're looking to eliminate. And if its not the FPS, then I'd say you'll have to ditch MFC to get the extra umfff you're looking for.

Oh, and by the way, you'll eventually want to keep the clear() command out of your game loop as it's a costly process to clear the entire frame buffer. Eventually, when you get enough stuff in your scene, the clear is redundant (clearing the target buffer that is, you'll still need to clear the Zbuffer and stencil buffer if you use them).

[edited by - Bretttido on September 13, 2002 4:51:20 PM]

Share this post


Link to post
Share on other sites
JimH,

Try downloading my engine to compare to. Its 640x480x32 in windowed mode...

http://www.visi.com/~brandonb/guitest.zip

Just close down the 2 sample windows inside the app... That should give you a good sample to compare to. Its about as plain as it can be (minus the overhead of input/gui/etc.)

Share this post


Link to post
Share on other sites
QueryPerformanceCounter is the slowest of the timers under windows also its buggy.

to JimH fillrate limitations is what youre seeing, to prove this make the window 320x2240 instead of 640x480 if fps go up by quite a bit then youre fillrate bound + theres not much u can do some ideas though
1/ dont clear colour buffer (depthbuffer trick has fallen out of favour the last few years)
2/ use lesser quality everything
3/ turn off everything thats not needed eg blending
4/ make the window smaller
5/ minimize overdraw

http://uk.geocities.com/sloppyturds/gotterdammerung.html

Share this post


Link to post
Share on other sites
Well, if there is one thing you can be sure about, it’s that if you make a long public post detailing a problem you’re having, that problem will invariably turn out to be something dumb that you did.

While I was recalculating my FPS only every second, I was inadvertently displaying FPS every frame using SetWindowText(). SetWindowText() is not a cheap function. As soon as I fixed it so that I call it only once a second, the idle FPS when way up. So in a way, Bretttido was right, although MFC is not a problem. And thanks for the tip about only clearing the z-buffer. That speeds things up a lot.

But I still think there is something not quite right about the time it takes me to do beginscene-endscene-present with no rendering, so I’ll be looking into that more next. I don’t get near the 3700 that JoeyBlow2 gets when those are being called.

Zedzeek, fillrate is not a problem because what I’m trying to do is get a fast FPS with no rendering at all. I did that just to identify other mistakes I might have. It seems to me that just beginscene-endscene-present was a problem with no rendering.

Share this post


Link to post
Share on other sites
Actually, I’ve just realized that the slowdown is in the call to Present() and it’s only due to the copying that occurs from the buffer to the window in windowed mode. So all problems solved. Sorry for the dumb mistake, but thanks for the replies because they were helpful and I did get stuff out if them.

Share this post


Link to post
Share on other sites
quote:

Oh, and by the way, you''ll eventually want to keep the clear() command out of your game loop as it''s a costly process to clear the entire frame buffer. Eventually, when you get enough stuff in your scene, the clear is redundant (clearing the target buffer that is, you''ll still need to clear the Zbuffer and stencil buffer if you use them).



I''d like a little more information on this, please. Won''t the frame buffer have random garbage in it if you do not clear it?

Share this post


Link to post
Share on other sites
It will, but you won't see it because your scene has replaced it all.

If you render enough stuff (terrain, sky, etcetera), you'll end up filling the entire frame buffer with your rendered scene. In other words, there's not a pixel that won't be rendered on, and therefore, all the random stuff will be replaced by your scene.

If I rendered some terrain filling the bottom half of the screen, for instance, you'd see the terrain in the bottom half, and lots of random garbage in the top half of the screen. However, if I rendered my terrain in the bottom half and some sky in the top half, the sky's pixels will have replaced the garbage pixels. Ergo, no need to clear.

[edited by - Bas Paap on September 15, 2002 2:47:26 PM]

Share this post


Link to post
Share on other sites