Jump to content

  • Log In with Google      Sign In   
  • Create Account


Swapchain performance


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
10 replies to this topic

#1 gnomgrol   Members   -  Reputation: 568

Like
0Likes
Like

Posted 14 August 2012 - 12:04 PM

Hey,
After a lot of examining performance issues with my directx11-application, I came to a conclusion (could have been come earlier to this conclusion, but anyway). After I redused the app just to the "outer" directx, I found what is slowing down everything so much:
My SwapChain->Present(0,0)!
When calling it without drawing anything I already drop down to 600-700 fps. Then, when using
ClearRenderTargetView(renderTargetView, bgColor);
and
ClearDepthStencilView(depthStencilView, D3D11_CLEAR_DEPTH|D3D11_CLEAR_STENCIL, 1.0f, 0);
the FPS are already down to 300-350.

Now I would like to know: Is this right? I got the code for the swapchain and the creation of the device from a tutorial, so I can't really say what is up with it. Those 3 lines have to be called to render properly anyway, and my GPU is pretty good ( it can run BF3 and stuff np ).
So I'm suspicious about this code for the swapchain. Before I post all of this, can you confirm that 300 on this 3 lines is way to little?
(Without those 3 lines I do have 250,000)

Thank you

Edited by gnomgrol, 14 August 2012 - 12:06 PM.


Sponsor:

#2 mhagain   Crossbones+   -  Reputation: 7467

Like
2Likes
Like

Posted 14 August 2012 - 03:35 PM

This kind of behaviour is quite normal; it's better to measure in milliseconds per frame rather than frames per second, at which point you'll see that - even though it looks as if your clears wipe out half of your performance - they're actually taking quite a small amount of additional time. (Note that if you're overdrawing the entire color buffer per frame anyway then you should be able to get away without clearing it, which will shave back a bit of perf.)

However, for a GPU that can run BF3 your overall performance seems shockingly low and points at something else being a cause of trouble. You should be getting framerates in the thousands here (for reference, I just tested similar on a low/mid AMD mobile GPU and easily cleared 2000fps at 1024x768 windowed). Have a look over your timer code to make sure there's nothing odd there, and also make double-sure that you're not calling Sleep anywhere. And there may be something evil in your game loop; e.g. GetMessage instead of PeekMessage, or bad PeekMessage handling.

Also worth enabling the debug runtimes and checking if you've got anything that's troublesome there, as well as capturing calls for a frame in PIX and having a look at what's happening from that point of view (be aware that a PIX call capture may make the first D3D call each frame seem inordinately longer than it really is - that's just an artefact of the sampling/capture mechanism and not an indicator of anything bad). Finally, make absolutely certain that you're not creating/destroying any D3D resources each frame - even if you don't otherwise use them, this is a very expensive operation and cration/destruction should be moved to runtime.

Edited by mhagain, 14 August 2012 - 03:39 PM.

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#3 gnomgrol   Members   -  Reputation: 568

Like
0Likes
Like

Posted 15 August 2012 - 01:13 AM

Thanks for your answer.
I could get rid of the clearing, but as you pointed out that is not the biggest problem here.
I assume that my timers are fine overall, when I'm calling only them FPS are on 300,000. I'm not creating any resources, all that is happening are those 3 lines and the timers.
I render in a 1920x1080 window, I want the game to be at this resolution afterwards anyway. (on 1024x768 I get 500fps)
When I draw 256*256*9 vertices in a trianglelist I am already down to 10-20 FPS.
Maybe my decribtion of the backbuffer or the depthstencilview etc. is bad, causing to loose so much performace when presenting the scene to it?
If you know a good tutorial on initialising directx11, I would apprechiate it.

Edited by gnomgrol, 15 August 2012 - 04:13 AM.


#4 Seabolt   Members   -  Reputation: 632

Like
0Likes
Like

Posted 15 August 2012 - 12:09 PM

Could you post the code to initialize your device, devcon and swap chain?
Perception is when one imagination clashes with another

#5 gnomgrol   Members   -  Reputation: 568

Like
0Likes
Like

Posted 15 August 2012 - 01:20 PM

Here you go:


//Describe our Buffer
DXGI_MODE_DESC bufferDesc;
ZeroMemory(&bufferDesc, sizeof(DXGI_MODE_DESC));
bufferDesc.Width = 1024;
bufferDesc.Height = 768;
bufferDesc.RefreshRate.Numerator = 60;
bufferDesc.RefreshRate.Denominator = 1;
bufferDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
bufferDesc.ScanlineOrdering = DXGI_MODE_SCANLINE_ORDER_UNSPECIFIED;
bufferDesc.Scaling = DXGI_MODE_SCALING_UNSPECIFIED;

//Describe our SwapChain
DXGI_SWAP_CHAIN_DESC swapChainDesc;
 
ZeroMemory(&swapChainDesc, sizeof(DXGI_SWAP_CHAIN_DESC));
swapChainDesc.BufferDesc = bufferDesc;
swapChainDesc.SampleDesc.Count = 1; //antialiasing
swapChainDesc.SampleDesc.Quality = 0;
swapChainDesc.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT;
swapChainDesc.BufferCount = 1;
swapChainDesc.OutputWindow = hWnd;
swapChainDesc.Windowed = TRUE;
swapChainDesc.SwapEffect = DXGI_SWAP_EFFECT_DISCARD;
//Create our SwapChain  D3D11_CREATE_DEVICE_DEBUG
hr = D3D11CreateDeviceAndSwapChain(NULL, D3D_DRIVER_TYPE_HARDWARE, NULL, NULL, NULL, NULL,D3D11_SDK_VERSION, &swapChainDesc, &SwapChain, &d3d11Device, NULL, &d3d11DevCon);

//Create our BackBuffer
ID3D11Texture2D* BackBuffer;
hr = SwapChain->GetBuffer( 0, __uuidof( ID3D11Texture2D ), (void**)&BackBuffer );

//Create our Render Target
hr = d3d11Device->CreateRenderTargetView( BackBuffer, NULL, &renderTargetView );
BackBuffer->Release();

//And:

D3D11_TEXTURE2D_DESC depthStencilDesc;
depthStencilDesc.Width	 = 1024;
depthStencilDesc.Height    = 768;
depthStencilDesc.MipLevels = 1;
depthStencilDesc.ArraySize = 1;
depthStencilDesc.Format    = DXGI_FORMAT_D24_UNORM_S8_UINT;
depthStencilDesc.SampleDesc.Count   = 1;
depthStencilDesc.SampleDesc.Quality = 0;
depthStencilDesc.Usage		  = D3D11_USAGE_DEFAULT;
depthStencilDesc.BindFlags	  = D3D11_BIND_DEPTH_STENCIL;
depthStencilDesc.CPUAccessFlags = 0;
depthStencilDesc.MiscFlags	  = 0;
//Create the Depth/Stencil View
d3d11Device->CreateTexture2D(&depthStencilDesc, NULL, &depthStencilBuffer);
d3d11Device->CreateDepthStencilView(depthStencilBuffer, NULL, &depthStencilView);

//Set our Render Target
d3d11DevCon->OMSetRenderTargets( 1, &renderTargetView, depthStencilView );


//Create the Viewport
D3D11_VIEWPORT viewport;
ZeroMemory(&viewport, sizeof(D3D11_VIEWPORT));
viewport.TopLeftX = 0;
viewport.TopLeftY = 0;
viewport.Width = 1024;
viewport.Height = 768;
viewport.MinDepth = 0.0f;
viewport.MaxDepth = 1.0f;
//Set the Viewport
d3d11DevCon->RSSetViewports(1, &viewport);



#6 mhagain   Crossbones+   -  Reputation: 7467

Like
0Likes
Like

Posted 15 August 2012 - 02:57 PM

Is your bufferDesc coming from an enumerated mode or did you just put the values in yourself? Have a look at http://msdn.microsoft.com/en-us/library/windows/desktop/bb205075%28v=vs.85%29.aspx - with particular attention to the section with the "Full-Screen Performance Tip" header.

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#7 gnomgrol   Members   -  Reputation: 568

Like
0Likes
Like

Posted 16 August 2012 - 01:45 AM

I got the code from a tutorial, all the code was like the one you can see here (ofc I changed windowssize). I'll look into this, but I have to say I find this msdnpages often hard to understand.

Edited by gnomgrol, 16 August 2012 - 03:26 AM.


#8 Hodgman   Moderators   -  Reputation: 27904

Like
0Likes
Like

Posted 16 August 2012 - 02:00 AM

it's better to measure in milliseconds per frame rather than frames per second

Quote for emphasis.
Converting your figures:
Presenting without drawing = 600fps == 1.6ms
Presenting with clearing = 350fps == 2.86ms
Difference == 1.26ms

A 1920x1080 window occupies around at least 8MiB of RAM -- taking 1ms to write 8MiB of data is pretty good, that's about 8GiB/s bandwidth.
This does not seem like you're experiencing a performance problem here.

Don't measure performance with FPS.

When I draw 256*256*9 vertices in a trianglelist I am already down to 10-20 FPS.

There's not enough information to draw any conclusions from that statement. For all I know, that's two-hundred-thousand triangles which all overlap each other and cover the entire screen (which would generate 1.5 terrabytes of pixel output). If this triangle-list is causing performance problems, it's a separate issue to the above cost of clearing. You'll have to perform a series of experiments to see what the slow-down is (e.g. slow vertex processing, slow pixel shader, etc)...

#9 gnomgrol   Members   -  Reputation: 568

Like
0Likes
Like

Posted 16 August 2012 - 02:45 AM

Thanks for your answer. But like mhAgain mentioned, I should have a much better performace on my GPU when just clearing and presenting the window.
What I can tell you is this: I draw the trianglelist 9 times instanced, but instancing it does cost me nearly the same performance as calling draw 9 times. Could that be a hint on a slow pixelshader? When the whole screen is covered with the triangles, I loose the performance, when I move the cam so I just look at blank background, the performace rises a lot!

Edited by gnomgrol, 16 August 2012 - 03:06 AM.


#10 Hodgman   Moderators   -  Reputation: 27904

Like
0Likes
Like

Posted 16 August 2012 - 03:33 AM

I should have a much better performace on my GPU when just clearing and presenting the window.

How do you know - what GPU is it? What's it's theoretical memory bandwidth? How long in theory should it take to transfer 16MiB of data?

I draw the trianglelist 9 times instanced, but instancing it does cost me nearly the same performance as calling draw 9 times

Reducing draw-calls via instancing is an optimisation to reduce CPU-side overhead. You are almost certainly GPU-bound, so this is no surprise.

When the whole screen is covered with the triangles, I loose the performance, when I move the cam so I just look at blank background, the performace rises a lot!

This is a definite hint that the bottleneck in your program is pixel-processing. Your pixel shader could be too complex (too many instructions, too many texture fetches), your model could have too much over-draw (triangles appearing over the top of each other, causing many triangles to calculate pixel values that are overwritten), or you could be generating too much ROP throughput (e.g. blending lots of pixels into a frame-buffer, or using an expensive framebuffer format).

#11 gnomgrol   Members   -  Reputation: 568

Like
0Likes
Like

Posted 16 August 2012 - 04:10 AM

As I mentioned aboth, my GPU can run BF3 and other performanceinstensive games with out any problems, so that really shouldnt be the problem.
I reduced my CPU side stuff because I was certain that the problem lied there. My pixelshader is fairly simple, and there are not THAT much pixel beeing processed multiple times.
I can't say that I have a clue what the 3rd thing you mentioned is, but I will go ahead and research.


I noticed something wierd. When I draw 9x 256*256 instanced without height, shadows and normals passed to shader, the performance sinks down to 10 FPS/ 100ms,
when I draw 36x 256*256 with a normal drawcall for each (and normals etc. send to the shader), I still remain with 50FPS/20ms. By changing the pixelshader to return float4(1.0f...); I don't get a performaceincrease.

I came up with another question. When the window in which direct3d is drawing has not the focus, which influence has this on the performance? I noticed that in many games the FPS go down to 10 if I focus another window.
I got a console for my application, so I can output things for debugging etc. Could this have influence on the performance?

Edited by gnomgrol, 16 August 2012 - 06:57 AM.





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS