• Advertisement
Sign in to follow this  

DX11 Gpu profiling with DX11 Queries

This topic is 2167 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm trying to add realtime gpu profiling as explained here:

I am aware that the results are not going to be as good as those provided by a 'real' profiler,
but I'm looking for something that gives me rough performance values in real-time,
and this seems like a good way to do this.

Unfortunately I've run into a bit of a problem.
Here is what I do:

When I begin a performance block:
- Context->Begin(Disjoint Query)
- Context->End(Timestamp Start Query)

When I end a performance block:
- Context->End(Timestamp End Query)
- Context->End(Disjoint Query)

Then I wait for the queries to be ready (in my case, I just wait 5 frames, during which no further queries are started)
Then I try to get the query data:
-Context->GetData(Timestamp Start Query) -> StartTime
-Context->GetData(Timestamp End Query) -> EndTime
-Context->GetData(Disjoint Query) -> DisjointData

I calculate the delta time and divide by the Disjoint frequency.
Then I release all queries and restart the process.

Here's what I get for the first frame:
StartTime: 1331644297633886016
EndTime: 1331644297634561696
Frequency: 1000000000

Which results in 0.67567998ms
This seems reasonable.

But when I do all of this for any frame after this (for example frame 5) I get:

StartTime: 1331644456765925600
EndTime: 1331644456765910848
Frequency: 1000000000

Which results in -0.014752ms (actually 1.8446744e+013 due to buffer underflow)
Unless my graphics card is breaking the laws of physics, this seems like a rather unreasonable result.

I cannot quite understand why this is happening, after all I'm not re-using old queries or anything.

Here's some code:
void CPerfCounterGPU::Begin( const std::string & Name )
auto & ProfileData = Profiles[Name];
D3D11_QUERY_DESC desc;
ZeroMemory(&desc, sizeof(desc));
Graphics->Device->CreateQuery(&desc, &ProfileData.DisjointQuery);
desc.Query = D3D11_QUERY_TIMESTAMP;
Graphics->Device->CreateQuery(&desc, &ProfileData.TimestampStartQuery);
Graphics->Device->CreateQuery(&desc, &ProfileData.TimestampEndQuery);
// Start a disjoint query first
Graphics->Context->Begin(ProfileData.DisjointQuery); // Insert the start timestamp
ProfileData.QueryStarted = true;
ProfileData.StartFrame = CurrentFrame;

void CPerfCounterGPU::End( const std::string & Name )
auto & ProfileData = Profiles[Name];
if(ProfileData.QueryStarted && ProfileData.StartFrame == CurrentFrame)
// Insert the end timestamp
// End the disjoint query

Then this gets called after at the end of every frame:
void CPerfCounterGPU::EndFrame()
// Iterate over all of the profiles
std::map<std::string, GpuProfileData>::iterator it;
for(it = Profiles.begin(); it != Profiles.end(); it++)
auto & ProfileData = (*it).second;
// Wait N frames for the query to be ready
if(ProfileData.StartFrame + QueryLatency >= CurrentFrame)
// Get query data
UINT64 StartTime = 0;
while(Graphics->Context->GetData(ProfileData.TimestampStartQuery, &StartTime, sizeof(StartTime), 0) != S_OK);
UINT64 EndTime = 0;
while(Graphics->Context->GetData(ProfileData.TimestampEndQuery, &EndTime, sizeof(EndTime), 0) != S_OK);
while(Graphics->Context->GetData(ProfileData.DisjointQuery, &DisjointData, sizeof(DisjointData), 0) != S_OK);
float Time = 0.0f;
UINT64 Delta = EndTime - StartTime;
float Frequency = static_cast<float>(DisjointData.Frequency);
Time = (Delta / Frequency) * 1000.0f;
PerfTimes[(*it).first] = Time;
// Release Queries and start over

Any ideas what could be going wrong here?


Share this post

Link to post
Share on other sites
After some testing I found out that this is actually working.
That is, for every piece of code except for the one I am trying to profile...

I'm trying to profile the dispatch call that generates my procedural planet.
This makes me think that these queries do not work properly when using dispatch calls in between timestamp queries.

I have tested it on another dispatch call, which performs collision detection with the generated planet.
This time it returns positive values, but they are highly unreasonable.
The calculated value jumps between 0.2 and 0.6ms, whereas Nsight indicates a gpu time of 29 microseconds, which is much more reasonable.

With draw calls on the other hand, the calculated values are pretty close to the Nsight ones.

So... I guess I can't do timestamp queries when using compute shaders?

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
  • Advertisement
  • Popular Tags

  • Advertisement
  • Popular Now

  • Similar Content

    • By Fleshbits
      Back around 2006 I spent a good year or two reading books, articles on this site, and gobbling up everything game dev related I could. I started an engine in DX10 and got through basics. I eventually gave up, because I couldn't do the harder things.
      Now, my C++ is 12 years stronger, my mind is trained better, and I am thinking of giving it another go.
      Alot has changed. There is no more SDK, there is evidently a DX Toolkit, XNA died, all the sweet sites I used to go to are 404, and google searches all point to Unity and Unreal.
      I plainly don't like Unity or Unreal, but might learn them for reference.
      So, what is the current path? Does everyone pretty much use the DX Toolkit? Should I start there? I also read that DX12 is just expert level DX11, so I guess I am going DX 11.
      Is there a current and up to date list of learning resources anywhere?  I am about tired of 404s..
    • By Stewie.G
      I've been trying to implement a basic gaussian blur using the gaussian formula, and here is what it looks like so far:
      float gaussian(float x, float sigma)
          float pi = 3.14159;
          float sigma_square = sigma * sigma;
          float a = 1 / sqrt(2 * pi*sigma_square);
          float b = exp(-((x*x) / (2 * sigma_square)));
          return a * b;
      My problem is that I don't quite know what sigma should be.
      It seems that if I provide a random value for sigma, weights in my kernel won't add up to 1.
      So I ended up calling my gaussian function with sigma == 1, which gives me weights adding up to 1, but also a very subtle blur.
      Here is what my kernel looks like with sigma == 1
              [0]    0.0033238872995488885    
              [1]    0.023804742479357766    
              [2]    0.09713820127276819    
              [3]    0.22585307043511713    
              [4]    0.29920669915475656    
              [5]    0.22585307043511713    
              [6]    0.09713820127276819    
              [7]    0.023804742479357766    
              [8]    0.0033238872995488885    
      I would have liked it to be more "rounded" at the top, or a better spread instead of wasting [0], [1], [2] with values bellow 0.1.
      Based on my experiments, the key to this is to provide a different sigma, but if I do, my kernel values no longer adds up to 1, which results to a darker blur.
      I've found this post 
      ... which helped me a bit, but I am really confused with this the part where he divide sigma by 3.
      Can someone please explain how sigma works? How is it related to my kernel size, how can I balance my weights with different sigmas, ect...
      Thanks :-)
    • By mc_wiggly_fingers
      Is it possible to asynchronously create a Texture2D using DirectX11?
      I have a native Unity plugin that downloads 8K textures from a server and displays them to the user for a VR application. This works well, but there's a large frame drop when calling CreateTexture2D. To remedy this, I've tried creating a separate thread that creates the texture, but the frame drop is still present.
      Is there anything else that I could do to prevent that frame drop from occuring?
    • By cambalinho
      i'm trying draw a circule using math:
      class coordenates { public: coordenates(float x=0, float y=0) { X = x; Y = y; } float X; float Y; }; coordenates RotationPoints(coordenates ActualPosition, double angle) { coordenates NewPosition; NewPosition.X = ActualPosition.X*sin(angle) - ActualPosition.Y*sin(angle); NewPosition.Y = ActualPosition.Y*cos(angle) + ActualPosition.X*cos(angle); return NewPosition; } but now i know that these have 1 problem, because i don't use the orign.
      even so i'm getting problems on how i can rotate the point.
      these coordinates works between -1 and 1 floating points.
      can anyone advice more for i create the circule?
    • By isu diss
      I managed convert opengl code on http://john-chapman-graphics.blogspot.co.uk/2013/02/pseudo-lens-flare.html to hlsl, but unfortunately I don't know how to add it to my atmospheric scattering code (Sky - first image). Can anyone help me?
      I tried to bind the sky texture as SRV and implement lens flare code in pixel shader, I don't know how to separate them (second image)

  • Advertisement