Swapchain performance

Started by
9 comments, last by gnmgrl 11 years, 8 months ago
Hey,
After a lot of examining performance issues with my directx11-application, I came to a conclusion (could have been come earlier to this conclusion, but anyway). After I redused the app just to the "outer" directx, I found what is slowing down everything so much:
My SwapChain->Present(0,0)!
When calling it without drawing anything I already drop down to 600-700 fps. Then, when using
ClearRenderTargetView(renderTargetView, bgColor);
and
ClearDepthStencilView(depthStencilView, D3D11_CLEAR_DEPTH|D3D11_CLEAR_STENCIL, 1.0f, 0);
the FPS are already down to 300-350.

Now I would like to know: Is this right? I got the code for the swapchain and the creation of the device from a tutorial, so I can't really say what is up with it. Those 3 lines have to be called to render properly anyway, and my GPU is pretty good ( it can run BF3 and stuff np ).
So I'm suspicious about this code for the swapchain. Before I post all of this, can you confirm that 300 on this 3 lines is way to little?
(Without those 3 lines I do have 250,000)

Thank you
Advertisement
This kind of behaviour is quite normal; it's better to measure in milliseconds per frame rather than frames per second, at which point you'll see that - even though it looks as if your clears wipe out half of your performance - they're actually taking quite a small amount of additional time. (Note that if you're overdrawing the entire color buffer per frame anyway then you should be able to get away without clearing it, which will shave back a bit of perf.)

However, for a GPU that can run BF3 your overall performance seems shockingly low and points at something else being a cause of trouble. You should be getting framerates in the thousands here (for reference, I just tested similar on a low/mid AMD mobile GPU and easily cleared 2000fps at 1024x768 windowed). Have a look over your timer code to make sure there's nothing odd there, and also make double-sure that you're not calling Sleep anywhere. And there may be something evil in your game loop; e.g. GetMessage instead of PeekMessage, or bad PeekMessage handling.

Also worth enabling the debug runtimes and checking if you've got anything that's troublesome there, as well as capturing calls for a frame in PIX and having a look at what's happening from that point of view (be aware that a PIX call capture may make the first D3D call each frame seem inordinately longer than it really is - that's just an artefact of the sampling/capture mechanism and not an indicator of anything bad). Finally, make absolutely certain that you're not creating/destroying any D3D resources each frame - even if you don't otherwise use them, this is a very expensive operation and cration/destruction should be moved to runtime.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Thanks for your answer.
I could get rid of the clearing, but as you pointed out that is not the biggest problem here.
I assume that my timers are fine overall, when I'm calling only them FPS are on 300,000. I'm not creating any resources, all that is happening are those 3 lines and the timers.
I render in a 1920x1080 window, I want the game to be at this resolution afterwards anyway. (on 1024x768 I get 500fps)
When I draw 256*256*9 vertices in a trianglelist I am already down to 10-20 FPS.
Maybe my decribtion of the backbuffer or the depthstencilview etc. is bad, causing to loose so much performace when presenting the scene to it?
If you know a good tutorial on initialising directx11, I would apprechiate it.
Could you post the code to initialize your device, devcon and swap chain?
Perception is when one imagination clashes with another
Here you go:



//Describe our Buffer
DXGI_MODE_DESC bufferDesc;
ZeroMemory(&bufferDesc, sizeof(DXGI_MODE_DESC));
bufferDesc.Width = 1024;
bufferDesc.Height = 768;
bufferDesc.RefreshRate.Numerator = 60;
bufferDesc.RefreshRate.Denominator = 1;
bufferDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
bufferDesc.ScanlineOrdering = DXGI_MODE_SCANLINE_ORDER_UNSPECIFIED;
bufferDesc.Scaling = DXGI_MODE_SCALING_UNSPECIFIED;

//Describe our SwapChain
DXGI_SWAP_CHAIN_DESC swapChainDesc;

ZeroMemory(&swapChainDesc, sizeof(DXGI_SWAP_CHAIN_DESC));
swapChainDesc.BufferDesc = bufferDesc;
swapChainDesc.SampleDesc.Count = 1; //antialiasing
swapChainDesc.SampleDesc.Quality = 0;
swapChainDesc.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT;
swapChainDesc.BufferCount = 1;
swapChainDesc.OutputWindow = hWnd;
swapChainDesc.Windowed = TRUE;
swapChainDesc.SwapEffect = DXGI_SWAP_EFFECT_DISCARD;
//Create our SwapChain D3D11_CREATE_DEVICE_DEBUG
hr = D3D11CreateDeviceAndSwapChain(NULL, D3D_DRIVER_TYPE_HARDWARE, NULL, NULL, NULL, NULL,D3D11_SDK_VERSION, &swapChainDesc, &SwapChain, &d3d11Device, NULL, &d3d11DevCon);

//Create our BackBuffer
ID3D11Texture2D* BackBuffer;
hr = SwapChain->GetBuffer( 0, __uuidof( ID3D11Texture2D ), (void**)&BackBuffer );

//Create our Render Target
hr = d3d11Device->CreateRenderTargetView( BackBuffer, NULL, &renderTargetView );
BackBuffer->Release();

//And:

D3D11_TEXTURE2D_DESC depthStencilDesc;
depthStencilDesc.Width = 1024;
depthStencilDesc.Height = 768;
depthStencilDesc.MipLevels = 1;
depthStencilDesc.ArraySize = 1;
depthStencilDesc.Format = DXGI_FORMAT_D24_UNORM_S8_UINT;
depthStencilDesc.SampleDesc.Count = 1;
depthStencilDesc.SampleDesc.Quality = 0;
depthStencilDesc.Usage = D3D11_USAGE_DEFAULT;
depthStencilDesc.BindFlags = D3D11_BIND_DEPTH_STENCIL;
depthStencilDesc.CPUAccessFlags = 0;
depthStencilDesc.MiscFlags = 0;
//Create the Depth/Stencil View
d3d11Device->CreateTexture2D(&depthStencilDesc, NULL, &depthStencilBuffer);
d3d11Device->CreateDepthStencilView(depthStencilBuffer, NULL, &depthStencilView);

//Set our Render Target
d3d11DevCon->OMSetRenderTargets( 1, &renderTargetView, depthStencilView );


//Create the Viewport
D3D11_VIEWPORT viewport;
ZeroMemory(&viewport, sizeof(D3D11_VIEWPORT));
viewport.TopLeftX = 0;
viewport.TopLeftY = 0;
viewport.Width = 1024;
viewport.Height = 768;
viewport.MinDepth = 0.0f;
viewport.MaxDepth = 1.0f;
//Set the Viewport
d3d11DevCon->RSSetViewports(1, &viewport);

Is your bufferDesc coming from an enumerated mode or did you just put the values in yourself? Have a look at http://msdn.microsoft.com/en-us/library/windows/desktop/bb205075%28v=vs.85%29.aspx - with particular attention to the section with the "Full-Screen Performance Tip" header.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

I got the code from a tutorial, all the code was like the one you can see here (ofc I changed windowssize). I'll look into this, but I have to say I find this msdnpages often hard to understand.

it's better to measure in milliseconds per frame rather than frames per second
Quote for emphasis.
Converting your figures:
Presenting without drawing = 600fps == 1.6ms
Presenting with clearing = 350fps == 2.86ms
Difference == 1.26ms

A 1920x1080 window occupies around at least 8MiB of RAM -- taking 1ms to write 8MiB of data is pretty good, that's about 8GiB/s bandwidth.
This does not seem like you're experiencing a performance problem here.

Don't measure performance with FPS.

When I draw 256*256*9 vertices in a trianglelist I am already down to 10-20 FPS.
There's not enough information to draw any conclusions from that statement. For all I know, that's two-hundred-thousand triangles which all overlap each other and cover the entire screen (which would generate 1.5 terrabytes of pixel output). If this triangle-list is causing performance problems, it's a separate issue to the above cost of clearing. You'll have to perform a series of experiments to see what the slow-down is (e.g. slow vertex processing, slow pixel shader, etc)...
Thanks for your answer. But like mhAgain mentioned, I should have a much better performace on my GPU when just clearing and presenting the window.
What I can tell you is this: I draw the trianglelist 9 times instanced, but instancing it does cost me nearly the same performance as calling draw 9 times. Could that be a hint on a slow pixelshader? When the whole screen is covered with the triangles, I loose the performance, when I move the cam so I just look at blank background, the performace rises a lot!
I should have a much better performace on my GPU when just clearing and presenting the window.
How do you know - what GPU is it? What's it's theoretical memory bandwidth? How long in theory should it take to transfer 16MiB of data?
I draw the trianglelist 9 times instanced, but instancing it does cost me nearly the same performance as calling draw 9 times[/quote]Reducing draw-calls via instancing is an optimisation to reduce CPU-side overhead. You are almost certainly GPU-bound, so this is no surprise.
When the whole screen is covered with the triangles, I loose the performance, when I move the cam so I just look at blank background, the performace rises a lot![/quote]This is a definite hint that the bottleneck in your program is pixel-processing. Your pixel shader could be too complex (too many instructions, too many texture fetches), your model could have too much over-draw (triangles appearing over the top of each other, causing many triangles to calculate pixel values that are overwritten), or you could be generating too much ROP throughput (e.g. blending lots of pixels into a frame-buffer, or using an expensive framebuffer format).

This topic is closed to new replies.

Advertisement