Jump to content

  • Log In with Google      Sign In   
  • Create Account


DX11 SwapChain present slow


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
5 replies to this topic

#1 brinsky   Members   -  Reputation: 145

Like
0Likes
Like

Posted 13 January 2014 - 07:06 AM

Hi!

I've exhausted every other way to solve the problem, so here I am. Hopefully somebody can point me in the right direction.

I'm building a deferred lighting engine on D3D11, C++. Rendering is done as follows:

- MRT (albedo, normal, depth)

- Light geometry

- G-Buffer

- Post-processing

 

These phases are fast and they're working rather well. The problem is at swap chain present call (no vsync). It takes forever. What's weird about it is that however complex the scene is, the present function takes a couple of times the time needed for all the rendering phases combined. To put things into perspective, here's an example (timings):

MRT: 2.35445 ms
Lights: 3.19276 ms
G-buffer: 0.0114392 ms
Post-processing: 0.00946387 ms
Full loop without swap chain present: 5.87718 ms
Swap chain present call: 25.6921 ms

-----

Full: 32.1059 ms
 

As you can see, that's far beyond unacceptable. My first attempt at it was to simplify all the pixel shaders to the point where they returned a fixed color value. Although the rendering phases timings drop dramaticaly, even below 0.01 ms combined, the present call takes up to 5ms. I had to strike out PS bottleneck, my next attempt was to change target formats to something simple like DXGI_FORMAT_R8G8B8A8_UNORM. To no avail. So now I'm just shooting in the dark trying this and that without the required knowledge how to approach a specific problem like that. I have never worked in depth with gpu profilers (other than PIX - saddly deprecated) therefore I have no idea how to splice open a present process and have a look at what's going on in there. Actually I'm puzzled this scenario is possible at all. Any insight is on swap chain and it's present method is appreciated. I'm working with both AMD and Nvidia graphic cards, so any official profiler help is most welcome. Also, I have the most recent drivers installed.

I doubt the above info is sufficient, I'll post what you need as we go.

 

Thank you in advance,

b

 



Sponsor:

#2 kauna   Crossbones+   -  Reputation: 2338

Like
3Likes
Like

Posted 13 January 2014 - 11:17 AM

As far as I know, calling present is same as flushing the command buffers of the GPU. So the profiling you do before the present command gives you only information about how long time the CPU spent handling the issued commands. The CPU isn't waiting for the GPU to finish the commands, until you call Present (or you do something else that forces the GPU to sync with the CPU). 

 

Cheers!



#3 brinsky   Members   -  Reputation: 145

Like
1Likes
Like

Posted 13 January 2014 - 04:25 PM

Thank you, kauna. That makes a lot of sense. Meaning I do have a bottleneck either at the shader stage or at some other GPU command execution. Which leaves me with GPU profiling.  I will report about the culprit once I find it.

Thanks again!



#4 mhagain   Crossbones+   -  Reputation: 7821

Like
1Likes
Like

Posted 13 January 2014 - 05:17 PM

A possible cause here is a mis-sized backbuffer, i.e. where your backbuffer size isn't the same as your window's client rect.  Calling GetClientRect on your window, then comparing with the backbuffer size will very quickly tell you if that's what you've got.


It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#5 MJP   Moderators   -  Reputation: 10909

Like
3Likes
Like

Posted 14 January 2014 - 01:53 AM

As far as I know, calling present is same as flushing the command buffers of the GPU. So the profiling you do before the present command gives you only information about how long time the CPU spent handling the issued commands. The CPU isn't waiting for the GPU to finish the commands, until you call Present (or you do something else that forces the GPU to sync with the CPU). 

 

Cheers!

 

To clarify a bit further...essentially the driver won't let the CPU get more than N frames ahead of the GPU. So if the CPU is just spitting out lots of commands the GPU is taking a long time to execute them, the driver will start to block the CPU in Present once N frames have been buffered. This magic "N" number can be queried and set with IDXGIDevice1::GetMaximumFrameLatency and IDXGIDevice1::SetMaximumFrameLatency (it defaults to 3 frames).

So yeah, you definitely want to dig into GPU profiling. You can perform some limited GPU profiling yourself using timestamp queries, but you have to be careful with them because they're often not accurate (this is because they only measure how long it takes for the GPU to actually get to the end of the time stamp, and not how long it takes for the GPU to completely finish executing the Draw or Dispatch calls being bracketed by the time stamps). Ideally you'll want to use a vendor-specific tool like Nsight or GPU PerfStudio.



#6 brinsky   Members   -  Reputation: 145

Like
0Likes
Like

Posted 14 January 2014 - 08:52 AM

 

A possible cause here is a mis-sized backbuffer, i.e. where your backbuffer size isn't the same as your window's client rect.  Calling GetClientRect on your window, then comparing with the backbuffer size will very quickly tell you if that's what you've got.

Thank you for your input, mhagain. Just to be sure on that, before creating window I call WINAPI AdjustWindowRect, also I do not permit resizing the window at this stage, therefore the backbuffer size is safe.

 

 

essentially the driver won't let the CPU get more than N frames ahead of the GPU. So if the CPU is just spitting out lots of commands the GPU is taking a long time to execute them, the driver will start to block the CPU in Present once N frames have been buffered.

MJP, thank you. That makes even further sense. There has to be some synchronicity between CPU and GPU at some level. I'll dig into profilers.

 

Thank you everybody for your insight, I appreciate it. I have a good lead now.






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS