Windows 10: DX12 low latency tearing free rendering
The waitable swapchain used to be window only, but it works in fullscreen too as of build 14943 ( do not have other build to track when it started support ). be sure to call the setframelatency funtion too.
As for measuring latency, gpuview can help too.
I'd expect to be able to get <16ms of latency using the waitable object, with a maximum frame latency of 1. If you ensure that your window covers the screen (or use SetFullscreenState) and call ResizeBuffers, you should engage independent flip, and your frames should make it to the screen on the next VSync. In practice, it looks like we may have an off-by-one here, as I'm only able to get ~32ms, but it seems like that should be sufficient for you guys.
Are you sure you're waiting before every frame (including the first one)? If not, you could end up with an extra frame of latency getting added into the waitable object.
Like Galop1n said, GPUView and PresentMon tools are helpful for determining why the latency is there.
hi,
thanks for the quick response. About the SetFrameLatency API, MSDN states:
"Sets the number of frames that the system is allowed to queue for rendering. ....
.......
The maximum number of back buffer frames that a driver can queue."
If I have two buffers (a front and back buffer in a full screen swap chain), buffer count on the swap chain set to 2.
How does this SetFrameLatency queue related to these buffers?
I am trying to understand were in the present chain we have queues as queues add latency :-)
regards,
TF
Hi Jesse,
I have some output from "presentmon", however my own measurements are done use a light sensor (taped to the screen).
The last column states 32 ms (MsUntilDisplayed) is that what to expect or should it be in the order of 16 ms?
I have not (yet) worked with gpuview. Is presentmon up to the job or should I invest in gpuview?
Application,ProcessID,SwapChainAddress,Runtime,SyncInterval,AllowsTearing,PresentFlags,PresentMode,Dropped,TimeInSeconds,MsBetweenPresents,MsBetweenDisplayChange,MsInPresentAPI,MsUntilRenderComplete,MsUntilDisplayed
So seems like your monitor has a ~18ms latency built in, if you're measuring 50 but PresentMon is saying 32. Unfortunately it looks like the waitable object may not be working properly - are you able to use the same present stats technique that you used in D3D9Ex? That should give you similar results.
I tried to use the GetFrameStatistics however the struct returned contains zeroes.
So the trick with the present stats we use on DX9 does not work on DX12.
Tried DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL but also DXGI_SWAP_EFFECT_FLIP_DISCARD
I am using "CreateSwapChain" API maybe I should use "CreateSwapChainForHnwd"???
I created a version of the program which has v-sync disabled. It gives tearing but I would like to see what the latency will when measuring with the light sensor. I will pick this up tomorrow when I am back at the office.
PS: what do you mean by "Unfortunately it looks like the waitable object may not be working properly"?
A bug in my program or in the OS/Driver??
In practice, it looks like we may have an off-by-one here, as I'm only able to get ~32ms, but it seems like that should be sufficient for you guys
The 32 ms I was referring to on Windows 7 (full screen / DWM disabled) includes the latency added by the display. So it seems we get one frame additional latency on Windows 10 compared to Windows 7.
Anybody succeeded getting down to one frame latency on using a waitable swap chain on Windows 10???
Circling back to this, it does look like there's an off-by-one in the frame latency waitable object. A requested frame latency value of 1 means "give me the minimum frame latency possible from any present mode, not necessarily the current one." So a composed swapchain will get you 2 frames of latency, just like a fullscreen / independent flip swapchain will.
Regarding the present stats workaround, I've confirmed we've got a bug there which is causing the zeroes. The workaround is to avoid using the SetFullscreenState API and just adjust your windows to cover the screens manually. If you do this (and call ResizeBuffers afterwards), then present stats should work correctly and you should be able to use that to get down to 1 frame of latency when your swapchain qualifies for independent flip.
Note that there are scenarios where composition will still be used (e.g. the volume indicator pops up), and the minimum latency does become 2. If you go the route of using frame statistics, and wait for a frame to be on-screen before rendering another, this will cause your application's framerate to drop to 30hz or worse. I have, however, confirmed that this approach does allow a 16ms latency measured by PresentMon. This approach will work in D3D11 or D3D12 as long as you use one of the FLIP swap effects (mandatory in D3D12).
Bottom line: we had good support from Microsoft over the last weeks but eventually we gave up on the DX12 waitable swap chain approach because it gives one additional frame of latency.
See comment Jesse previous post:
"it does look like there's an off-by-one in the frame latency waitable object"
For our multi-GPU / multi head application we have started testing on DX11.
The first results look good.