Jump to content
  • Advertisement
Sign in to follow this  
maxest

DX11 Query Timestamp inconsistency

This topic is 888 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hey guys.

 

I've implemented some postprocess effect. Measured its performance with DX query using double buffering and I got 2ms.

 

Next, I thought about making my scene more complex so I loaded a bit more meshes and whole lot of textures (previously 1 draw call, now 400) and measured postprocess time. Should be the same, right? It's independent of the geometry rendering phase. But to my surprise the DX queries gave me 3.2ms.

 

I thought that maybe the problem was that I use too few queries; that reading queries from n-1 frame in n-th frame was not good enough. So I switched to quadruple buffering. And now I get 1.42ms timing.

 

Is there a way to get consistent timings from DX queries? I remember that when I was working on similar things at work where I had access to PS4 I only profiled there because PS4 *always* gave me consistent measurements. But now I'm stuck with DX11 and PC and want to get correct timings.

 

[EDIT]

Also, depending on whether I run my app in Debug or Release mode, I get different timings. 2x worse in Debug. I'm not sure if they should differ that much since I measure GPU time.

Edited by maxest

Share this post


Link to post
Share on other sites
Advertisement

Are you running your application in such a way that the GPU is guaranteed to be running flat-out 100% of the time? (Not CPU bound, Vsync off)

 

If not, your GPU is liable to throttle down to lower clock speeds when it's nowhere near being taxed and this can result in higher measured times for each part of the frame. The more you load up the GPU with extra work before post-processing the harder it will have to work to get the frame rendered inside 16.6/33.3ms and therefore it won't have the luxury of underclocking itself. If it can't do that, then post-processing may well run faster than it did before given when the extra work before post-processing is present.

Share this post


Link to post
Share on other sites
There are more factors in post-processing than just resolution. Cache plays a huge role (some effects slow down significantly when sampling a wider area) and on any non-UMM platform you have to deal with resource residency. During rendering of your 400 objects you could have easily ejected the textures you need for post-processing either from cache or from the graphics-card RAM entirely.


L. Spiro

Share this post


Link to post
Share on other sites

If you are on Windows 7, make sure to disable Aero.

Also for best results perform these queries in exclusive fullscreen.

 

The compositor's presentation can seriously skew your measurements.

Share this post


Link to post
Share on other sites


There are more factors in post-processing than just resolution. Cache plays a huge role (some effects slow down significantly when sampling a wider area) and on any non-UMM platform you have to deal with resource residency. During rendering of your 400 objects you could have easily ejected the textures you need for post-processing either from cache or from the graphics-card RAM entirely.

That should not be the case. I render the scene first to an off-screen buffer, and from then on I start my measurements and do post-processing. Any of the scene's textures don't get bound during post-processing.

 


Are you running your application in such a way that the GPU is guaranteed to be running flat-out 100% of the time? (Not CPU bound, Vsync off)

Yes.

 


If you are on Windows 7, make sure to disable Aero.

Also for best results perform these queries in exclusive fullscreen.

 

The compositor's presentation can seriously skew your measurements.

I'm on Windows 8.1. Seems that I can't turn off aero here. I also run in full-screen.

 

Basically, now that I changed my profiler's code to use quadruple buffering my results are more stable. Changing the amount of geometry doesn't influence my post-processing time. It differs when I changed to Debug mode in Visual Studio. Then from 1.5ms I go to 3.5ms.

 

If you would like to check out how my profiler's code looks like it's here:

http://maxest.gct-game.net/stuff/profiler.h

Share this post


Link to post
Share on other sites

There are more factors in post-processing than just resolution. Cache plays a huge role (some effects slow down significantly when sampling a wider area) and on any non-UMM platform you have to deal with resource residency. During rendering of your 400 objects you could have easily ejected the textures you need for post-processing either from cache or from the graphics-card RAM entirely.

That should not be the case. I render the scene first to an off-screen buffer, and from then on I start my measurements and do post-processing. Any of the scene's textures don't get bound during post-processing.

Post effects like DOF and SSAO can have a variable cost depending on the scene, as they'll be using different sized filters based on the depth of objects, etc...
 

Basically, now that I changed my profiler's code to use quadruple buffering my results are more stable. Changing the amount of geometry doesn't influence my post-processing time. It differs when I changed to Debug mode in Visual Studio. Then from 1.5ms I go to 3.5ms.

Sounds like your driver really wanted to have 3 frames in flight, and you were stalling *something* by requesting results faster than that sad.png
In your debug builds, do you also create a debug D3D device, or is that an independent setting?

Share this post


Link to post
Share on other sites


Post effects like DOF and SSAO can have a variable cost depending on the scene, as they'll be using different sized filters based on the depth of objects, etc...

That one is true of course. I only had objections regarding the rendering of geometry that goes into g-/lighting-buffers.

 


Sounds like your driver really wanted to have 3 frames in flight, and you were stalling *something* by requesting results faster than that sad.png

Actually, my new profiler's code (that one I posted link to) works almost equally well with quadruple and double buffering. When I mentioned double buffering in my first post I actually had only one set of queries which I was trying to collect *at the beginning of each frame*. Since I fired off queries later in my code, then in the beginning of the next frame, I should be able to collect those queries, right? They were fired off in the previous frame after fall.

 


In your debug builds, do you also create a debug D3D device, or is that an independent setting?

That was not an independent setting so thanks for pointing this out. Nevertheless, Debug mode with DX-non-debug gives me ~2ms. Progress from 3.5ms but still not 1.5ms. So it still looks like "CPU settings" affect my GPU measurements.

Share this post


Link to post
Share on other sites

Does your app compile your shaders differently based on configuration? If you're using visual studio, it defaults to turning off shader optimizations in debug mode.

Share this post


Link to post
Share on other sites

Right now my DX settings are independent from the configuration. I always create the device without D3D11_CREATE_DEVICE_DEBUG flag

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!