Jump to content
  • Advertisement
Sign in to follow this  

DX11 Swapchain->Present locks up

This topic is 2528 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I am busy with a small C++ DX11 engine. I made a spritebatch and it can render ~8000 sprites in ~0.015 seconds, which is quite nice. But the actual "feel" is really sluggish. So i checked which line of code takes so much longer and it's actually the swapchain->present.

swapChain->Present(0, 0);

As you can see, there is no vsync enabled. Below is a copy from my console output.
"Time" means the time in seconds for my spritebatch to render (4000 sprites in this example). Most of the time its around 0.005 seconds, or 5 ms.
"Time Present" is the amount of time it takes for swapchain->Present(0,0); to finish. This is usually < 0.001 seconds, but sometimes jumps up to 0.2 or even 0.4 seconds!
"timeInSeconds" is the total amount a frame takes (WindowEvents & Update & Draw)

Note: I am only rendering one texture, in 4 different places. And the code is run in optimized release mode without debugger attached. There is no difference between 'windowed' or 'fullscreen' mode. The console print is from 'windowed' mode

GPU Name: ATI Radeon HD 5700 Series Memory: 499
Loaded texture: Content/emmahawt.png [818 x 542]
Loaded texture: Content/test.png [2 x 2]
Time : 0.005598
Time Present : 0.000715
timeInSeconds : 0.007079
Time : 0.005138
Time Present : 0.000327
timeInSeconds : 0.005817
Time : 0.005006
Time Present : 0.000323
timeInSeconds : 0.005702
Time : 0.004893
Time Present : 0.218834
timeInSeconds : 0.224316
Time : 0.004375
Time Present : 0.436552
timeInSeconds : 0.441346
Time : 0.004092
Time Present : 0.000649
timeInSeconds : 0.442094
Time : 0.004252
Time Present : 0.000592
timeInSeconds : 0.007898
Time : 0.004191
Time Present : 0.000498
timeInSeconds : 0.005948
Time : 0.004361
Time Present : 0.199789
timeInSeconds : 0.207114
Time : 0.003872
Time Present : 0.879333
timeInSeconds : 0.884902
Time : 0.004544
Time Present : 0.001660
timeInSeconds : 0.034172
Time : 0.003994
Time Present : 0.000405
timeInSeconds : 0.006845
Time : 0.004393
Time Present : 0.000279
timeInSeconds : 0.005872
Time : 0.003892
Time Present : 0.643319
timeInSeconds : 0.649605
Time : 0.004066
Time Present : 0.000291
timeInSeconds : 0.230853
Time : 0.004023
Time Present : 0.000440
timeInSeconds : 0.007868
Time : 0.004655
Time Present : 0.000263
timeInSeconds : 0.006101
Time : 0.004221
Time Present : 0.203712
timeInSeconds : 0.208846
Time : 0.005497
Time Present : 0.655746
timeInSeconds : 0.663833
Time : 0.006069
Time Present : 0.000398
timeInSeconds : 0.236720
Time : 0.005969
Time Present : 0.000433
timeInSeconds : 0.010679
Average FPS was (rendering 4000 sprites): 68.938774
timeInSeconds : 0.227208
Time : 0.003952
Time Present : 0.000257
Press any key to continue . . .

Code for creating the swapchain:

void GraphicsDevice::InitSwapChain()
HRESULT result;

// Initialize the swap chain description.
memset(&swapChainDesc,0, sizeof(swapChainDesc));

swapChainDesc.BufferCount = 1;
swapChainDesc.BufferDesc.Width = windowHandler->GetCurrentWidth();
swapChainDesc.BufferDesc.Height = windowHandler->GetCurrentHeight();
swapChainDesc.BufferDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
swapChainDesc.OutputWindow = windowHandler->GetHandler();
swapChainDesc.Windowed = !windowHandler->GetFullScreen();
swapChainDesc.SampleDesc.Count = 1;
swapChainDesc.SampleDesc.Quality = 0;

// Set the refresh rate of the back buffer.
swapChainDesc.BufferDesc.RefreshRate.Numerator = refreshRateNum;
swapChainDesc.BufferDesc.RefreshRate.Denominator = refreshRateDenom;
swapChainDesc.BufferDesc.RefreshRate.Numerator = 0;
swapChainDesc.BufferDesc.RefreshRate.Denominator = 1;

// Set the scan line ordering and scaling to unspecified.
swapChainDesc.BufferDesc.ScanlineOrdering = DXGI_MODE_SCANLINE_ORDER_UNSPECIFIED;
swapChainDesc.BufferDesc.Scaling = DXGI_MODE_SCALING_UNSPECIFIED;

// Discard the back buffer contents after presenting.

// Don't set the advanced flags.
swapChainDesc.Flags = 0;

// Set the feature level to DirectX 11/10.1/10
D3D_FEATURE_LEVEL featureLevel[] =

// Create the swap chain, Direct3D device, and Direct3D device context.
result = D3D11CreateDeviceAndSwapChain( NULL,
#ifdef _DEBUG

Share this post

Link to post
Share on other sites
SwapChain::Present exhibits asynchronous behaviour and queues blit/flip operations rather than performing them immediately. Depending on your swap mode (you seem to be using blit rather than flip) the rate at which the queue gets processed with vsync disabled will vary, as will the maximum length of the queue.

You're submitting frames at roughly 200/s and they're being consumed by the device at about 70/s; so it's most likely that you're saturating the presentation queue and the Present function is then blocking until there's some free space. It looks like you might be able to use DXGI_PRESENT_DO_NOT_WAIT though that may be specific to DX11 threaded model, rather than behaving like DX9's do not wait flag.


Share this post

Link to post
Share on other sites
Thanks for the insight Jansic! What you're saying is making sense. I looked at the flag but it has a small note:

"Direct3D 11: [color=#2A2A2A]This enumeration value is supported starting with Windows Developer Preview." In other words not possible on windows 7 :(

According to what you are saying the problem should be gone if i either render more sprites (lets say 12000 sprites) or enable vsync (not correctly working currently, need to look into that) or put a ~10ms Sleep in my code. I will test this a bit later tonight.

Also i remember trying "DXGI_PRESENT_DO_NOT_SEQUENCE" But this caused no backbuffer flipping -> I only see the very first frame i draw.

Share this post

Link to post
Share on other sites
I am still working on this problem. And i am kinda getting near the solution. I actually had GPU-Z running on my 2nd monitor and i detected that my GPU (HD5770) didnt switch to 3d clocksettings. This is ofcourse pretty strange. I tried messing with some value's in my code on the directx part, but nothing changed. (not that i could change much there anyway).

So i went to my window handler and copied settings over from the hieroglyph 3.0 engine.

New settings:


// Get the instance of this application.
hinstance = GetModuleHandle(NULL);

// Give the application a name.
applicationName = L"Default Name";

// Setup the windows class with default settings.
wc.lpfnWndProc = WndProc;
wc.cbClsExtra = 0;
wc.cbWndExtra = 0;
wc.hInstance = hinstance;
wc.hIcon = LoadIcon(NULL, IDI_WINLOGO);
wc.hIconSm = wc.hIcon;
wc.hCursor = LoadCursor(NULL, IDC_ARROW);
wc.hbrBackground = (HBRUSH)GetStockObject(BLACK_BRUSH);
wc.lpszMenuName = NULL;
wc.lpszClassName = applicationName;
wc.cbSize = sizeof(WNDCLASSEX);

// Register the window class.

DWORD windowStyle;

// Setup the screen settings depending on whether it is running in full screen or in windowed mode.
windowStyle = WS_POPUP | WS_VISIBLE;

// Place the window in the middle of the screen.
int posX = (GetSystemMetrics(SM_CXSCREEN) - screenWidth) / 2;
int posY = (GetSystemMetrics(SM_CYSCREEN) - screenHeight) / 2;

// Create the window with the screen settings and get the handle to it.
hwnd = CreateWindowEx(WS_EX_APPWINDOW, applicationName, applicationName,
posX, posY, screenWidth, screenHeight, NULL, NULL, hinstance, NULL);

// Bring the window up on the screen and set it as main focus.
ShowWindow(hwnd, SW_SHOWNORMAL);

This fixed the GPU not going to 3D mode and it actually makes the the amount of time the presentbuffer takes to run to 65 ms. But it's still the same as in:

0.0003 seconds
0.0004 s
0.0650 s
0.0640 s
0.0003 s
etc etc etc

So sometimes its super fast, or 'as expected' and sometimes it locks up for ~4 vsyncs?

I just remembed i also added this piece of code:

IDXGIDevice1* idxgiDevice = NULL;
device->QueryInterface(__uuidof(IDXGIDevice1), (void**)&idxgiDevice);

Here is a graph without SetMaximumFrameLatency(1): http://puu.sh/da2r
As you can see, at the start its not as much of a problem but later one it locks up more

Here is a graph WITH SetMaximumFrameLatency(1): http://puu.sh/da38
It seems to me that it locks up for ~0.064 seconds for a duration of 15 frames.

Note: the "fps" label is incorrect and its the amount of seconds it takes for PresentBuffer to complete.
Note 2: All these tests are without vsync.

Question is, is there anything i need to change to creating the window? Or maybe a different setting. I can't run the hieroglyph engine myself as i have to install VS2010 C++ express, which i cant currently do.

Interesting, i did a deviceContext->Flush() call after calling deviceContext->DrawIndexed(...) and it actually depends on how much i draw how long it takes. So why i am getting this bad performance, rendering only 1000 vertices with 1500 indices..

Share this post

Link to post
Share on other sites
Fixed. It was actually just the amount of pixels i had to draw was to much. I was wanting to draw 4000 x 440000 pixels. Which is to much for the GPU to handle. When moving back to much smaller (and normal sized) sprites the performance was as expected. smile.png

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!