Xperf and Immediate context.

Started by
3 comments, last by Happy SDE 8 years, 1 month ago

Today I decided to measure performance of my application the way I always did with Win32 applications: via xperf and custom markers.

I placed ID3D11DeviceContext2::BeginEventInt()/EndEvent() in all significant places, ran xperf, collected data, and here is one frame timings:


Line #	Label (Field 3)	        Time (s)
2	Cleanup	                46.07790765
3	Deferred GeometryPass	46.077912
4	SSAO pass	        46.0780287
5	SSAO Blur	        46.0780344
6	SSAO Blur1::Hor	        46.07803541
7	SSAO Blur1::Vert	46.07803742
8	SSAO Blur2::Hor	        46.0780391
9	SSAO Blur2::Vert	46.0780401
10	RtVisualizer            46.07804178
11	Wnd3d::Present	        46.07804513
12	Cleanup	                46.10182305    (the next frame)

The biggest timing is between Present() and the next frame.

In process explorer I can see 50% of GPU utilization.

I assume procexp’s 50% = 100% of a real GPU utilization (GPU's fan runs very noisy).

So GPU does not sleep after Present().

With this 2 pieces information in mind, I’ve made two conclusions:

  1. Immediate context is not immediate: it collects commands to draw. Present() call starts to execute them.
  2. With my current knowledge I can’t measure GPU performance. =(

It means I can’t see difference between 16 samples and 64 samples in SSAO pass.

Right now the only tools I may to use are: noise of my video card cooler, procexp’s GPU graph, and FPS in my application.

It’s not quite precise.

I would like to see difference when I tune some parameters of my renderer with high precision.

Are there any suggestions how to measure GPU (shader) performance in DX11?

Thanks in advance.

Advertisement

I placed ID3D11DeviceContext2::BeginEventInt()/EndEvent() in all significant places, ran xperf, collected data, and here is one frame timings:

I have no experience with xperf, but I would assume that you should have two completely different sets of timings for those markers -- one for the amount of CPU time that's elapsed between calling begin/end, and one for how much time the GPU spent executing the buffered commands that were created between begin/end.

I built my own D3D11 timing system using ID3D11DeviceContext::End and D3D11_QUERY_TIMESTAMP type query objects -- one event at the beginning/end of a "pass" and you can measure the amount of time that pass took on the GPU side. Note that you have to wait a frame (or several) before retrieving the timestamp data from these queries, as the GPU is async/delayed.

I get data on GPU time that looks like: 8d5RWX8.png

Immediate context is not immediate: it collects commands to draw. Present() call starts to execute them.

Kinda. The immediate context builds command buffers behind the scenes, and submits them into the GPU's work queue at arbitrary points in time -- either when it feels its internal command buffer is "full enough", or when it's forced to by an implicit "flush" (which Present can be).

The real reason that you see high CPU times in Present though is that it acts as the latency-limiting sync point between the two processors. If the CPU frametime is lower than the GPU frametime, then Present will typically block the CPU until the GPU has finished executing all of the commands that were submitted on the previous frame, to ensure that the GPU is only 1-frame behind. This is driver dependent -- e.g. perhaps it will block the CPU until the GPU has finished executing all commands from 3 frames ago... Either way, it serves as the point where the CPU will wait for the GPU to catch up sufficiently, to avoid creating an ever-growing amount of latency.

I built my own D3D11 timing system using ID3D11DeviceContext::End and D3D11_QUERY_TIMESTAMP type query objects -- one event at the beginning/end of a "pass" and you can measure the amount of time that pass took on the GPU side. Note that you have to wait a frame (or several) before retrieving the timestamp data from these queries, as the GPU is async/delayed.

I get data on GPU time that looks like: 8d5RWX8.png

Hodgman, what software do you use to visualize this graph?

Is it free like xperf, is your own creation, or some third-party?

Hodgman, what software do you use to visualize this graph?

In chrome, open a new tab and navigate to chrome://tracing/
I just dump my timing data out in their JSON format and use their tool to visualize it smile.png

It supports multiple processes and threads, so I represent the GPU timings as a separate process in the same file, alongside the CPU timing events.

Thanks to Microsoft’s ETW, I just found another way to do the same job.

  1. It works with NVidia nSight debugger
  2. It works with VS Graphic debugger

So here is GPU timings of one frame with NVidia NSight:

Graph.png

Here is a tree with CPU and GPU timings:

Table.png

Here is VS 2015 graphic events:

VS_draws.png

The main idea:

  1. Windows has wonderful ETW performance system.
  2. NVidia Nsight uses it with it’s own markers. These markers are like nvtxRangePushW/nvtxRangePop or other from nvToolsEx library.
  3. NVidia does not support ID3D11DeviceContext2, because of that there is no BeginEventInt =(

So in order to work it in both environments, need to create RAII wrapper like:


class EventMarker
{
public:
    EventMarker(const wchar_t* str, Microsoft::WRL::ComPtr<ID3D11DeviceContext> context)
    {
	context.As(&m_context);
	if (m_context) //NVidia nSight does not support ID3D11DeviceContext2 interface
	{
		m_context->BeginEventInt(str, 0); //Under VS graphic debugger
	}

	nvtxRangePushW(str); //Under NSight debugger
    }
    ~EventMarker()
    {
	if (m_context)
	{
		m_context->EndEvent();
	}

	nvtxRangePop();
    }
private:
	Microsoft::WRL::ComPtr<ID3D11DeviceContext2> m_context;
};

Usage: (maybe a lot of nested markers)


void SsaoPass::blurSsaoTexture()
{
    EventMarker em{ L"SSAO Blur", m_context };
    Blur1(); //Maybe nested marker there
    Blur2();
}

Update:

changing from

EventMarker(const std::wstring& str, Microsoft::WRL::ComPtr<ID3D11DeviceContext> context)

to

EventMarker(const std::wstring& str, const Microsoft::WRL::ComPtr<ID3D11DeviceContext>& context)

removes AddRef()/Release() =)

Table2.png

This topic is closed to new replies.

Advertisement