I've exhausted every other way to solve the problem, so here I am. Hopefully somebody can point me in the right direction.
I'm building a deferred lighting engine on D3D11, C++. Rendering is done as follows:
- MRT (albedo, normal, depth)
- Light geometry
These phases are fast and they're working rather well. The problem is at swap chain present call (no vsync). It takes forever. What's weird about it is that however complex the scene is, the present function takes a couple of times the time needed for all the rendering phases combined. To put things into perspective, here's an example (timings):
MRT: 2.35445 ms
Lights: 3.19276 ms
G-buffer: 0.0114392 ms
Post-processing: 0.00946387 ms
Full loop without swap chain present: 5.87718 ms
Swap chain present call: 25.6921 ms
Full: 32.1059 ms
As you can see, that's far beyond unacceptable. My first attempt at it was to simplify all the pixel shaders to the point where they returned a fixed color value. Although the rendering phases timings drop dramaticaly, even below 0.01 ms combined, the present call takes up to 5ms. I had to strike out PS bottleneck, my next attempt was to change target formats to something simple like DXGI_FORMAT_R8G8B8A8_UNORM. To no avail. So now I'm just shooting in the dark trying this and that without the required knowledge how to approach a specific problem like that. I have never worked in depth with gpu profilers (other than PIX - saddly deprecated) therefore I have no idea how to splice open a present process and have a look at what's going on in there. Actually I'm puzzled this scenario is possible at all. Any insight is on swap chain and it's present method is appreciated. I'm working with both AMD and Nvidia graphic cards, so any official profiler help is most welcome. Also, I have the most recent drivers installed.
I doubt the above info is sufficient, I'll post what you need as we go.
Thank you in advance,