Jump to content
  • Advertisement
Sign in to follow this  

DX11 Gpu profiling with DX11 Queries

This topic is 2344 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm trying to add realtime gpu profiling as explained here:

I am aware that the results are not going to be as good as those provided by a 'real' profiler,
but I'm looking for something that gives me rough performance values in real-time,
and this seems like a good way to do this.

Unfortunately I've run into a bit of a problem.
Here is what I do:

When I begin a performance block:
- Context->Begin(Disjoint Query)
- Context->End(Timestamp Start Query)

When I end a performance block:
- Context->End(Timestamp End Query)
- Context->End(Disjoint Query)

Then I wait for the queries to be ready (in my case, I just wait 5 frames, during which no further queries are started)
Then I try to get the query data:
-Context->GetData(Timestamp Start Query) -> StartTime
-Context->GetData(Timestamp End Query) -> EndTime
-Context->GetData(Disjoint Query) -> DisjointData

I calculate the delta time and divide by the Disjoint frequency.
Then I release all queries and restart the process.

Here's what I get for the first frame:
StartTime: 1331644297633886016
EndTime: 1331644297634561696
Frequency: 1000000000

Which results in 0.67567998ms
This seems reasonable.

But when I do all of this for any frame after this (for example frame 5) I get:

StartTime: 1331644456765925600
EndTime: 1331644456765910848
Frequency: 1000000000

Which results in -0.014752ms (actually 1.8446744e+013 due to buffer underflow)
Unless my graphics card is breaking the laws of physics, this seems like a rather unreasonable result.

I cannot quite understand why this is happening, after all I'm not re-using old queries or anything.

Here's some code:
void CPerfCounterGPU::Begin( const std::string & Name )
auto & ProfileData = Profiles[Name];
D3D11_QUERY_DESC desc;
ZeroMemory(&desc, sizeof(desc));
Graphics->Device->CreateQuery(&desc, &ProfileData.DisjointQuery);
desc.Query = D3D11_QUERY_TIMESTAMP;
Graphics->Device->CreateQuery(&desc, &ProfileData.TimestampStartQuery);
Graphics->Device->CreateQuery(&desc, &ProfileData.TimestampEndQuery);
// Start a disjoint query first
Graphics->Context->Begin(ProfileData.DisjointQuery); // Insert the start timestamp
ProfileData.QueryStarted = true;
ProfileData.StartFrame = CurrentFrame;

void CPerfCounterGPU::End( const std::string & Name )
auto & ProfileData = Profiles[Name];
if(ProfileData.QueryStarted && ProfileData.StartFrame == CurrentFrame)
// Insert the end timestamp
// End the disjoint query

Then this gets called after at the end of every frame:
void CPerfCounterGPU::EndFrame()
// Iterate over all of the profiles
std::map<std::string, GpuProfileData>::iterator it;
for(it = Profiles.begin(); it != Profiles.end(); it++)
auto & ProfileData = (*it).second;
// Wait N frames for the query to be ready
if(ProfileData.StartFrame + QueryLatency >= CurrentFrame)
// Get query data
UINT64 StartTime = 0;
while(Graphics->Context->GetData(ProfileData.TimestampStartQuery, &StartTime, sizeof(StartTime), 0) != S_OK);
UINT64 EndTime = 0;
while(Graphics->Context->GetData(ProfileData.TimestampEndQuery, &EndTime, sizeof(EndTime), 0) != S_OK);
while(Graphics->Context->GetData(ProfileData.DisjointQuery, &DisjointData, sizeof(DisjointData), 0) != S_OK);
float Time = 0.0f;
UINT64 Delta = EndTime - StartTime;
float Frequency = static_cast<float>(DisjointData.Frequency);
Time = (Delta / Frequency) * 1000.0f;
PerfTimes[(*it).first] = Time;
// Release Queries and start over

Any ideas what could be going wrong here?


Share this post

Link to post
Share on other sites
After some testing I found out that this is actually working.
That is, for every piece of code except for the one I am trying to profile...

I'm trying to profile the dispatch call that generates my procedural planet.
This makes me think that these queries do not work properly when using dispatch calls in between timestamp queries.

I have tested it on another dispatch call, which performs collision detection with the generated planet.
This time it returns positive values, but they are highly unreasonable.
The calculated value jumps between 0.2 and 0.6ms, whereas Nsight indicates a gpu time of 29 microseconds, which is much more reasonable.

With draw calls on the other hand, the calculated values are pretty close to the Nsight ones.

So... I guess I can't do timestamp queries when using compute shaders?

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!