Jump to content
  • Advertisement
Sign in to follow this  
Happy SDE

DX11 Xperf and Immediate context.

This topic is 930 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Today I decided to measure performance of my application the way I always did with Win32 applications: via xperf and custom markers.

 

I placed ID3D11DeviceContext2::BeginEventInt()/EndEvent() in all significant places, ran xperf, collected data, and here is one frame timings:

Line #	Label (Field 3)	        Time (s)
2	Cleanup	                46.07790765
3	Deferred GeometryPass	46.077912
4	SSAO pass	        46.0780287
5	SSAO Blur	        46.0780344
6	SSAO Blur1::Hor	        46.07803541
7	SSAO Blur1::Vert	46.07803742
8	SSAO Blur2::Hor	        46.0780391
9	SSAO Blur2::Vert	46.0780401
10	RtVisualizer            46.07804178
11	Wnd3d::Present	        46.07804513
12	Cleanup	                46.10182305    (the next frame)

The biggest timing is between Present() and the next frame.

In process explorer I can see 50% of GPU utilization.

I assume procexp’s 50% = 100% of a real GPU utilization (GPU's fan runs very noisy).

So GPU does not sleep after Present().

 

With this 2 pieces information in mind, I’ve made two conclusions:

  1. Immediate context is not immediate: it collects commands to draw. Present() call starts to execute them.
  2. With my current knowledge I can’t measure GPU performance. =(

It means I can’t see difference between 16 samples and 64 samples in SSAO pass.

Right now the only tools I may to use are: noise of my video card cooler, procexp’s GPU graph, and FPS in my application.

It’s not quite precise.

 

I would like to see difference when I tune some parameters of my renderer with high precision.

Are there any suggestions how to measure GPU (shader) performance in DX11?

 

Thanks in advance.

Edited by Happy SDE

Share this post


Link to post
Share on other sites
Advertisement

I placed ID3D11DeviceContext2::BeginEventInt()/EndEvent() in all significant places, ran xperf, collected data, and here is one frame timings:

I have no experience with xperf, but I would assume that you should have two completely different sets of timings for those markers -- one for the amount of CPU time that's elapsed between calling begin/end, and one for how much time the GPU spent executing the buffered commands that were created between begin/end.
 
I built my own D3D11 timing system using ID3D11DeviceContext::End and D3D11_QUERY_TIMESTAMP type query objects -- one event at the beginning/end of a "pass" and you can measure the amount of time that pass took on the GPU side. Note that you have to wait a frame (or several) before retrieving the timestamp data from these queries, as the GPU is async/delayed.

I get data on GPU time that looks like: 8d5RWX8.png
 

Immediate context is not immediate: it collects commands to draw. Present() call starts to execute them.

Kinda. The immediate context builds command buffers behind the scenes, and submits them into the GPU's work queue at arbitrary points in time -- either when it feels its internal command buffer is "full enough", or when it's forced to by an implicit "flush" (which Present can be).
 
The real reason that you see high CPU times in Present though is that it acts as the latency-limiting sync point between the two processors. If the CPU frametime is lower than the GPU frametime, then Present will typically block the CPU until the GPU has finished executing all of the commands that were submitted on the previous frame, to ensure that the GPU is only 1-frame behind. This is driver dependent -- e.g. perhaps it will block the CPU until the GPU has finished executing all commands from 3 frames ago... Either way, it serves as the point where the CPU will wait for the GPU to catch up sufficiently, to avoid creating an ever-growing amount of latency.

Edited by Hodgman

Share this post


Link to post
Share on other sites
I built my own D3D11 timing system using ID3D11DeviceContext::End and D3D11_QUERY_TIMESTAMP type query objects -- one event at the beginning/end of a "pass" and you can measure the amount of time that pass took on the GPU side. Note that you have to wait a frame (or several) before retrieving the timestamp data from these queries, as the GPU is async/delayed.

I get data on GPU time that looks like: 8d5RWX8.png

 

Hodgman, what software do you use to visualize this graph?

Is it free like xperf, is your own creation, or some third-party?

Edited by Happy SDE

Share this post


Link to post
Share on other sites

Thanks to Microsoft’s ETW, I just found another way to do the same job.

  1. It works with NVidia nSight debugger
  2. It works with VS Graphic debugger

So here is GPU timings of one frame with NVidia NSight:

Graph.png

 

Here is a tree with CPU and GPU timings:

Table.png

 

Here is VS 2015 graphic events:

VS_draws.png

 

The main idea:

  1. Windows has wonderful ETW performance system.
  2. NVidia Nsight uses it with it’s own markers. These markers are like nvtxRangePushW/nvtxRangePop or other from nvToolsEx library.
  3. NVidia does not support ID3D11DeviceContext2, because of that there is no BeginEventInt =(

 

So in order to work it in both environments, need to create RAII wrapper like:

class EventMarker
{
public:
    EventMarker(const wchar_t* str, Microsoft::WRL::ComPtr<ID3D11DeviceContext> context)
    {
	context.As(&m_context);
	if (m_context) //NVidia nSight does not support ID3D11DeviceContext2 interface
	{
		m_context->BeginEventInt(str, 0); //Under VS graphic debugger
	}

	nvtxRangePushW(str); //Under NSight debugger
    }
    ~EventMarker()
    {
	if (m_context)
	{
		m_context->EndEvent();
	}

	nvtxRangePop();
    }
private:
	Microsoft::WRL::ComPtr<ID3D11DeviceContext2> m_context;
};

Usage: (maybe a lot of nested markers)

void SsaoPass::blurSsaoTexture()
{
    EventMarker em{ L"SSAO Blur", m_context };
    Blur1(); //Maybe nested marker there
    Blur2();
}

Update:

changing from

       EventMarker(const std::wstring& str, Microsoft::WRL::ComPtr<ID3D11DeviceContext> context)

to

       EventMarker(const std::wstring& str, const Microsoft::WRL::ComPtr<ID3D11DeviceContext>& context)

 

removes AddRef()/Release() =)

Table2.png

Edited by Happy SDE

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!