App folder influencing swapbuffers behaviour?

Started by
3 comments, last by Trienco 17 years, 11 months ago
For the first time I just noticed something confusing. I ran my old terrain renderer after recompiling it with VC++ Express and was somewhat "impressed" by the well over 1000fps it showed me. I started Fraps to confirm and was about to take a 5min giggling break, when I ran it again. This time directly starting the binary and it was back at somewhat more believable 200fps. So whenever I run the release build from within VC++, something is not just screwing with the internal timing, but also throwing off Fraps as well. App is using OpenGL, rdtsc for timing and somehow Denmark is smelling funny today. What could it possibly be doing to cause that kind of weird behaviour? Especially since it makes profiling a real pain (change to explorer, copy binary to working directory, running binary... compared to pressing F5). [Edited by - Trienco on May 18, 2006 12:26:39 AM]
f@dzhttp://festini.device-zero.de
Advertisement
Quote:Original post by Trienco
Especially since it makes profiling a real pain (change to explorer, copy binary to working directory, running binary... compared to pressing F5).


aside:

This part is trivial to fix. You can have VS put the EXEs directly in the working directory instead of in the /Debug or /Release directories. Just jigger with the settings in the solution linker properties box deal. Usually I just have the debug EXEs named <myAppName>D.exe and leave the name normal for the release version. This way you don't need to have 2 copies of the data directories either (which I've found you do if you're using relative pathing for assets).

No idea what the timing deal is though. My guess is that when fraps is running you no longer get 100% of the CPU and your framerate takes a nosedive. This could easily happen on a low RAM system where running the 2nd app makes you start paging stuff to virtual memory. You should have an acurate framerate timer in your app; no need to use a 3rd party program to measure framerate.

-me
Quote:Original post by PalidineMy guess is that when fraps is running you no longer get 100% of the CPU and your framerate takes a nosedive.


That's the problem. Fraps doesn't change it, it just "confirms" the absurd number (1000fps for a 4096x4096 terrain and 5 blended textures) my internal counter is showing. I played some more and noticed that if I use the debug build, I get the same silly results even when starting it manually. I tried to bring the numbers down by increasing the lod to insane levels and compared. At some point they both go down to the same levels, so by the time I can "feel" if the displayed fps are right, they are identical.

Then I tried my favorite game: calculate a lot of square roots and print the result to keep it from being optimized away. Adding 100k sqrts per frame made it 300 from within VC, 100 else (including both ways of running the debug build). You can also tell, that the output in the console IS much faster (so it's not just the timer or fps counter). With a round million of sqrts both are down to 42fps.

Also changed the process priority, to see if maybe VC runs it with higher priority than usual, but even at realtime it's not making any difference. If the numbers wouldn't start to be the same at lower fps I could at least figure out if it still happens with vsync enabled. That still wouldn't explain one build running out of sync in a 1200fps way and the other in a 200fps way, though.


edit:

Ok, NOW it's just trying to be mean. I packed it into a zip and uploaded it (here). When I ran it from within the zip (using winrar, which I'm sure just dumps it to the temp folder and runs it from there), the release build was at 700fps instead of 300, the debug build still at 1200fps. Then I unpacked it to a different folder and got the same results. Then I copied the release binary BACK to the regular folder and... it only ran at 300fps. (outburst ahead) HOW THE §%&§& CAN THE FOLDER YOU RUN IT FROM MAKE THAT KIND OF FRIGGING DIFFERENCE (outburst end).

Amd64 3200+, 7900gt, 1gb. If you have too much time and a somewhat current graphics card, I would be curious if the db build is running almost twice as fast on your machine, too. Is it some vsync related freak behaviour? An Amd oddity? Even if nVidia found a magical way to completely decouple CPU and GPU and have the CPU do 100 loops while the GPU does one (and somehow ignores these 99frames of render calls), it shouldn't make a debug build two times faster than a release build (or make the same release build run at three different speeds, depending on where the binary is placed).

[Edited by - Trienco on May 17, 2006 1:55:25 PM]
f@dzhttp://festini.device-zero.de
Quote:Original post by Trienco
[...]rdtsc for timing[...]
This is your problem. RDTSC is dependant on the CPU speed, so you must be using some other timing function to figure out the number of cycles per second. If you're using a timer with less accuracy than rdtsc (and you are), then your problem is that with a structure like
endtime = time() + 1;start = rdtsc()while(time() < endtime);end = rdtsc();
, if time() is just about to change (say 3/4 of a unit time has passed), then your test only gets the number of cycles in 1/4 of a second.
Personally, I suggest just avoiding rdtsc entirely, because it doesn't work as a clock on computers that dynamically change CPU frequency such as laptops and newer computers. For windows, use either timeBeginPeriod then timeGetTime, or GetTickCount, or QueryPerformanceCounter.

If you insist on using rdtsc though, you need to modify your 'get cycles per second' function to do something like:
//Wait for low-res clock to changeendtime = time() + 1;while(time() < endtime);//now get time since whole unit of time will pass before clock changeendtime = time() + 1;start = rdtsc()while(time() < endtime);end = rdtsc();
Also, you should probably do it several times and take an average of the middle values (meaning do 10 samples, drop the highest 2 and lowest 2, then average the remaining 6).
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Quote:Original post by Extrarius
Quote:Original post by Trienco
[...]rdtsc for timing[...]
This is your problem.


That would still mean that Fraps is also using rdtsc and happens to be off in exactly the same way. Apart from that, my profiler is measuring clock cycles, not time, because that's more useful when comparing different approaches or algorithms than saying "this function takes x ns on a y ghz machine". So I get three more or less different values, that all look consistent, yet don't make any sense.

Quote:For windows, use either timeBeginPeriod then timeGetTime, or GetTickCount, or QueryPerformanceCounter.


Unfortunately timeGetTime and GetTickCount are either expensive or have the precision and/or resolution of a drunk elephant (are completely useless to profile code) and afaik MS changed the PerformanceCounter to internally use rdtsc as well.

It also doesn't explain why it DOES run faster (see the console output) or why the folder I run the binary from is making a difference (and it's yet completely consistent when running it multiple times). Also, the cpu speed would have to be off by a factor of 3 to 4, even the crappiest method to estimate it couldn't be that wrong.


After peppering it with more profile counters I found the difference in swap buffers. The release build started from VC blocks for only a few hundred ticks, started from the usual folder for 6000 and from a different folder for 2000. While it's at least the only spot that makes sense, it means that either swapbuffers isn't blocking until the rendering is finished or that rendering takes vastly different amounts of time (unlikely for exactly the same code). And why would the folder have an impact on this behaviour? Does that mean I could get better 3Dmark results by installing it somewhere else? *g*

edit: I guess that pretty much makes it a graphics or OpenGL topic.
f@dzhttp://festini.device-zero.de

This topic is closed to new replies.

Advertisement