[DX9] IDirect3DDevice9::Present() way to long

Started by
14 comments, last by supmagc 14 years, 4 months ago
@Adam - No offense intended, I'm afraid the profiler will not show the root of the problem. You are correct in one respect, it is due to an overload. When a world tranform takes place it modifies multiple structures in DX each and every time. MS warns not to do this too often (search MSDN for proof, thats where I found my original answer). Once it reaches a certain number of changes per Present() interval, it becomes overloaded with changes. After testing over a month's time (and tons of hair pulling) it is between 65 to 85 world transforms when the delay starts spiking badly. On my single core AMD 3000+ it would pause up to 3 seconds with just 100 world transforms. AND it only happens when Present() is used, nowhere else. I wrote my own profiler class to test each and every last DX command using the processor clock throughout the entire game loop. Less than the limit the normal tick count was around 40-50, past that it would sometimes peak at 2 million ticks (processor time ticks).

BTW, this is a per frame problem and MS knows about it. Once the frame is presented, the transform count is pretty much considered reset. Instancing may seem hard at first but in essense it is very similar to rendering a single instance of a mesh, just much, much faster and ;) only requires 1 world transform. In this case would you rather set it once or 400 times?

The only downside to instancing is the amount of data that can be rendered per pass (yes, you will have to use a shader unfortunately), this will limit the size of the mesh vertices|indicies to 65k per pass, but you can render more complex meshes simply by reducing the instances drawn per pass. Look at the 'Instancing' demo provided in the SDK on how to do this (plus it comes with a shader to build on). You can get pretty good speed just with 64 instances per pass.

This problem is with WORLD transforms only, none of the other transforms cause this problem. Since world transforms are what put your objects in place, you will have to find work-a-rounds to this problem as I stated previously.

******************************************************************************************
Youtube Channel

Advertisement
Did you read http://msdn.microsoft.com/en-us/library/ee415127%28VS.85%29.aspx before doing that profiling?

Note the list right at the bottom - SetTransform() should average about 3500 clock cycles. You really shouldn't be getting three second pauses caused by just calling that 100 times - it should take significantly less than one millisecond of CPU time.
Hehehe, we are speaking of 2 different clock cycles. The numbers from my profiler do not have anything to do with actual processor clock cycles. What you posted is processor cycles, my profiler uses the performance counter and the number has no real comparison to that. The formula to calculate the numbers I have posted are the number of count ticks needed to process a specific command. I did it both by total subroutine time and total command execution times including regular CPU commands, starting and stopping each timer before and after each command. The numbers only work within that realm, counting the ticks each command takes.

Look at the first post, he says Present() takes 200 ms, this means nothing to the profiler, it just returns how many ticks it took to get through the routine or command. MS commonly says in many areas of MSDN that this doesn't matter, but it does. If a command is normally running at 40-50 ticks then jumps to over 2 million of them, then it is a problem. It didn't matter to me that the GPU isn't processing the data fast enough, it matters that it locks up my system with over 2 million ticks at a time when it should only take 40-50 ticks which results in a huge pause in frame rate. I only wanted to know where it was happening. It was Present() causing the pause, nothing else.

What DX does in it's code was of no concern of mine, I wanted to know how it affected my CPU usage since I was optimizing my code for the highest possible speed and steady framerate. I did a pref check using the DX program you posted, it was no help what-so-ever since it wasn't picking up the WORLD transform problem. I got a hint from the MSDN documentation and did a few tests by reducing the 100 model test by 5 models at a time. It ended up varying between 65-85 models rendered that caused the problem (long since solved) so I chose to remove a single DX command one at a time. Thats one of the reasons it took so long to determine where the problem was. Once I commented out the WORLD transform, I could render a 1000 meshes at approx 50 ticks per Present(), which is quite normal. Added it back in and the spiking reappeared. One thing I have to really watch out for is the amount of time that a scene can render since I require a minimum of 30 fps if at all possible even on slower single core systems. I have 4 computers and the results were the same on all machines, WORLD transforms cause the delay in Present().

BTW, MS never really admits this is the cause, they hint at it (MSDN can be vague at times) so I had to hunt it down. As it stands now my entire game loop now runs less than 8k ticks per loop. I can provide the profiler class if you want to use it, just let me know. Only problem is that it uses my file IO class to save the log which I won't release (been building it over the past 12 years and I'm unwilling to put it in the public domain since it contains my encryption routines), this means you would have to build your own logger and saving code...

Anyways, I'm just going by what he posted where the delay was happening, after 10 years of writing DX apps, this is the only thing I have found that causes the delay in Present().

******************************************************************************************
Youtube Channel

All right than ... it seems I've finally determined the exact cause :D

As my first profiling indicated, some nvidia-dll needed a lot of time, and when Adam said something about fillrate and overdraw, I started thinking.

Something I didn't mention as I was really tired typing my previous posts, was that the planes I was so eager to render had all an alpha-channel which I used for AlphaTesting.
Nothing wrong with that, only that I'm using a deferred-renderer, and the way I implemented alpha-testing was by overwriting the Depth-value (something I did when I was debugging earlier on the same evening)

So, after surfing a bit on the internet I found an article about Tabula Rasa (GPU gems 3), and they stated something about a mysterious 'clip-command'. Well, the mistery was quickly solved.
My framerate still drops when I'm rendering 200 alpha-tested planes, but it's a lot less than before.
So I'm wondering now, ... what exactly does Present(0, 0, 0, 0); do ? (I always thought it simply filled the backbuffer with the frontbuffer, which I thought would be a linear time)


P.S:
I'm just not sure if the clip-command is the way to go on this one ? (the final game will have a lot of paralax-placed alpha-tested background-sprites)

I know this question should go into another topic, so maybe I'll make one later on today :-)
You might find using D3DRS_ALPHATESTENABLE (with the appropriate D3DRS_ALPHAREF and D3DRS_ALPHAFUNC) works faster than clip(), although it's not as flexible.

You can also get big savings by making your geometry not just simple quads - you can completely avoid drawing things that are transparent that way. For example a circular transparent texture on a quad will be slower than one on an octagon.

What Present() does is:

1. Wait for the GPU to finish processing all the commands you've sent it (if the GPU is what's limiting the frame rate).
2. Wait for the vertical blank (if you enabled it in the present parameters).
3. Swap the front and back buffers.

The first two steps can take some time, the third one is almost instant.
All right, tnx for explaining
Seems like ALpheTesting works better indeed, didn't knew if it would work on MRT's, but all's fine

tnx again

This topic is closed to new replies.

Advertisement