# DX11 Nothing renders in windowed mode on Windows 10 with dedicated Nvidia card

## Recommended Posts

Not sure if GDNet is the best place for this, but recently I "upgraded" my laptop to Windows 10, and ever since then, I've been seeing an issue with one of my projects where I'll get an empty window if I run the game in windowed mode with my dedicated graphics card (an Nvidia GTX 860M) - D3D won't even clear the screen to black, and my card makes a screeching noise. The problem doesn't manifest if I either run in fullscreen (no GPU noises) or force the game to run with my integrated card. This was NOT happening in Windows 8.1, so I'm inclined to suspect either a driver problem or possibly Win8.1 let me get away with doing something "naughty."

I haven't seen any similar problems with other DX11 games (though I haven't actually run any of them in windowed mode, so I'm wondering if I may be doing something wrong. Here is the code that sets up my device and swap chain:

// setup swap chain description
DXGI_SWAP_CHAIN_DESC mSwapChainDesc = { 0 };
mSwapChainDesc.OutputWindow = hwnd;
mSwapChainDesc.Windowed = !fullscreen;
mSwapChainDesc.BufferDesc.Width = windowWidth;
mSwapChainDesc.BufferDesc.Height = windowHeight;
mSwapChainDesc.BufferDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
mSwapChainDesc.BufferDesc.RefreshRate.Numerator = 60;
mSwapChainDesc.BufferDesc.RefreshRate.Denominator = 1;
mSwapChainDesc.BufferCount = 1;
mSwapChainDesc.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT;
mSwapChainDesc.SampleDesc.Count = 1;
mSwapChainDesc.Flags = DXGI_SWAP_CHAIN_FLAG_ALLOW_MODE_SWITCH;

// create the mDevice
HRESULT result = S_OK;
if (FAILED(result = D3D11CreateDeviceAndSwapChain(nullptr,
D3D_DRIVER_TYPE_HARDWARE,
nullptr,
0,
nullptr,
0,
D3D11_SDK_VERSION,
&mSwapChainDesc,
nullptr,
{
ErrorMsg("Failed to create D3D11 mDevice and swap chain");
return false;
}

There doesn't appear to be any error messages - all D3D calls, including the above, appear to succeed.

edit: Updated thread title to better reflect the symptoms of the problem. Edited by Oberon_Command

##### Share on other sites

Hm, this seems rather odd. I have been running DX11 for my studies for the past few months and I haven't seen any such problem (running W10, same graphics card). I've played a bit with the initialization to get the same settings for the swap chain as you have, but I'm not running into any problems. Granted, the code for initialization is not mine (from Luna's DX11 book), but I did rewrite the initialization part at some point for my own goals and didn't run into any problems either.

The only real difference I have is that I call D3D11CreateDevice and CreateSwapChain (on the DXGIFactory) separately, rather than the unified call. That might be a clue as to why things are going haywire.

If it helps, here's my initialization of the swapchain, though I haven't been able to actually test it for a while unfortunately. You can also dig through Luna's samples to see if he does anything differently, though I can guarantee that it's pretty much the code below written somewhat differently (and wouldn't explain any such problem).

        DXGI_SWAP_CHAIN_DESC swapChainDescription;
swapChainDescription.BufferDesc.Width = m_Width;
swapChainDescription.BufferDesc.Height = m_Height;
swapChainDescription.BufferDesc.RefreshRate.Numerator = 60;
swapChainDescription.BufferDesc.RefreshRate.Denominator = 1;
swapChainDescription.BufferDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
swapChainDescription.BufferDesc.ScanlineOrdering = DXGI_MODE_SCANLINE_ORDER_UNSPECIFIED;
swapChainDescription.BufferDesc.Scaling = DXGI_MODE_SCALING_UNSPECIFIED;
swapChainDescription.SampleDesc.Count = 1;
swapChainDescription.SampleDesc.Quality = 0;
swapChainDescription.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT;
swapChainDescription.BufferCount = 1;
swapChainDescription.OutputWindow = m_WindowHandle;
swapChainDescription.Windowed = true;
swapChainDescription.Flags = 0;

IDXGIDevice* dxgiDevice = nullptr;
HR(m_Device->QueryInterface(__uuidof(IDXGIDevice), reinterpret_cast<void**>(&dxgiDevice)));
IDXGIFactory* dxgiFactory = nullptr;
HR(dxgiFactory->CreateSwapChain(m_Device, &swapChainDescription, &m_SwapChain));

dxgiDevice->Release();
dxgiFactory->Release();


(That's not to say I'm denying driver problems, but perhaps this can get you around it. I can imagine it being 'fairly' annoying to not be able to run in windowed mode')

On a sidenote, screeching noise makes me think of coilwhine caused by super high FPS (which would make sense if you're not drawing anything). I have this problem on my pc quite a bit, but haven't heard about it occuring on a laptop GPU before.

Edited by AthosVG

##### Share on other sites

The "screeching noise" sounds like coil whine / squeaking. Does it sound like this or like this?

If so, this usually (but not always) means your card is drawing near maximum power.

Coil whine is considered harmless to your hardware, though some people believe if you hear coil noise, there's strong vibrations, if there's strong vibrations, it means gradual wear and tear over time (i.e. shorten lifespan); thus it's often advised to reduce the amount of time your GPU spends whining, just in case this turns to be more than a myth.

##### Share on other sites

I found it!

First of all, what didn't work: I tried the set of flags AthosVG posted; no difference in behaviour. I also tried pasting Luna's equivalent code to see if that would make a difference; no difference in behaviour. I looked at compiling the full samples, but they depend on D3DX and XNAMath which I apparently don't have since I'd heard these were deprecated. Is there a more up to date version somewhere?

I had a look at the samples here and those do appear to run correctly in windowed mode. Near as I can tell, I'm not doing anything differently, though I'm using the 11.0 code path instead of 11.1. I don't think that should make a difference - I broke into the samples in the debugger and forced them down the 11.0 path to see what would happen and they still worked correctly. In fact, I took my own initialization code and pasted it into the sample code (rearranging some of the names) and everything still worked correctly!

Then I looked at the initialization code and noticed that I'm deriving the back buffer width and height from a configuration file, whereas the tutorial is computing it directly from the window's client rectangle. Changing my code to use the client rectangle bounds for the back buffer got things rendering again! I also noticed that in windowed mode, the window's client rect appeared slightly smaller than it did when I ran the program on 8.1 - as though the window borders take up more room and infringe on some of the client rectangle where they didn't before. So maybe the OS has changed the way window sizes are calculated? Strange that this would only manifest when using my dedicated card, though.

So: moral of the story, ensure that your back buffer size matches that of your client rect!

edit: And it did indeed sound like coil whine. My best guess (without being much of a graphics programmer) for what must have been happening was that the back buffer was created successfully, but with a 0 size, causing all my initialization and draw calls to succeed without actually doing anything meaningful.

Edited by Oberon_Command

##### Share on other sites

I suspect the problem would also go away if you used FLIP_DISCARD instead of DISCARD. On laptops like that, when using the discrete GPU, the OS asks the discrete to put the contents somewhere that the integrated can see it, so that composition is fast and efficient. Apparently this operation doesn't work so well when the source and dest are different sizes (back buffer size != window size). With FLIP_DISCARD, the image is in something that's back-buffer-sized all the way through the stack until composition samples from it.

##### Share on other sites

I suspect the problem would also go away if you used FLIP_DISCARD instead of DISCARD. On laptops like that, when using the discrete GPU, the OS asks the discrete to put the contents somewhere that the integrated can see it, so that composition is fast and efficient. Apparently this operation doesn't work so well when the source and dest are different sizes (back buffer size != window size). With FLIP_DISCARD, the image is in something that's back-buffer-sized all the way through the stack until composition samples from it.

Possible, but it's also a performance issue if the sizes mismatch, so you really should match the sizes rather than try to work around it.

## Create an account

Register a new account

• ### Forum Statistics

• Total Topics
628349
• Total Posts
2982210
• ### Similar Content

• By 51mon
I want to change the sampling behaviour to SampleLevel(coord, ddx(coord.y).xx, ddy(coord.y).xx). I was just wondering if it's possible without explicit shader code, e.g. with some flags or so?

• Hello,
I want to improve the performance of my game (engine) and some of your helped me to make a GPU Profiler. After creating the GPU Profiler, I started to measure the time my GPU needs per frame. I refined my GPU time measurements to find my bottleneck.
Searching the bottleneck
Rendering a small scene in an Idle state takes around 15.38 ms per frame. 13.54 ms (88.04%) are spent while rendering the scene, 1.57 ms (10.22%) are spent during the SwapChain.Present call (no VSync!) and the rest is spent on other tasks like rendering the UI. I further investigated the scene rendering, since it takes über 88% of my GPU frame rendering time.
When rendering my scene, most of the time (80.97%) is spent rendering my models. The rest is spent to render the background/skybox, updating animation data, updating pixel shader constant buffer, etc. It wasn't really suprising that most of the time is spent for my models, so I further refined my measurements to find the actual bottleneck.
In my example scene, I have five animated NPCs. When rendering these NPCs, most actions are almost for free. Setting the proper shaders in the input layout (0.11%), updating vertex shader constant buffers (0.32%), setting textures (0.24%) and setting vertex and index buffers (0.28%). However, the rest of the GPU time (99.05% !!) is spent in two function calls: DrawIndexed and DrawIndexedInstance.
I searched this forum and the web for other articles and threads about these functions, but I haven't found a lot of useful information. I use SharpDX and .NET Framework 4.5 to develop my game (engine). The developer of SharpDX said, that "The method DrawIndexed in SharpDX is a direct call to DirectX" (Source). DirectX 11 is widely used and SharpDX is "only" a wrapper for DirectX functions, I assume the problem is in my code.
How I render my scene
When rendering my scene, I render one model after another. Each model has one or more parts and one or more positions. For example, a human model has parts like head, hands, legs, torso, etc. and may be placed in different locations (on the couch, on a street, ...). For static elements like furniture, houses, etc. I use instancing, because the positions never change at run-time. Dynamic models like humans and monster don't use instancing, because positions change over time.
When rendering a model, I use this work-flow:
Set vertex and pixel shaders, if they need to be updated (e.g. PBR shaders, simple shader, depth info shaders, ...) Set animation data as constant buffer in the vertex shader, if the model is animated Set generic vertex shader constant buffer (world matrix, etc.) Render all parts of the model. For each part: Set diffuse, normal, specular and emissive texture shader views Set vertex buffer Set index buffer Call DrawIndexedInstanced for instanced models and DrawIndexed models What's the problem
After my GPU profiling, I know that over 99% of the rendering time for a single model is spent in the DrawIndexedInstanced and DrawIndexed function calls. But why do they take so long? Do I have to try to optimize my vertex or pixel shaders? I do not use other types of shaders at the moment. "Le Comte du Merde-fou" suggested in this post to merge regions of vertices to larger vertex buffers to reduce the number of Draw calls. While this makes sense to me, it does not explain why rendering my five (!) animated models takes that much GPU time. To make sure I don't analyse something I wrong, I made sure to not use the D3D11_CREATE_DEVICE_DEBUG flag and to run as Release version in Visual Studio as suggested by Hodgman in this forum thread.
My engine does its job. Multi-texturing, animation, soft shadowing, instancing, etc. are all implemented, but I need to reduce the GPU load for performance reasons. Each frame takes less than 3ms CPU time by the way. So the problem is on the GPU side, I believe.

• I was wondering if someone could explain this to me
I'm working on using the windows WIC apis to load in textures for DirectX 11. I see that sometimes the WIC Pixel Formats do not directly match a DXGI Format that is used in DirectX. I see that in cases like this the original WIC Pixel Format is converted into a WIC Pixel Format that does directly match a DXGI Format. And doing this conversion is easy, but I do not understand the reason behind 2 of the WIC Pixel Formats that are converted based on Microsoft's guide
I was wondering if someone could tell me why Microsoft's guide on this topic says that GUID_WICPixelFormat40bppCMYKAlpha should be converted into GUID_WICPixelFormat64bppRGBA and why GUID_WICPixelFormat80bppCMYKAlpha should be converted into GUID_WICPixelFormat64bppRGBA
In one case I would think that:
GUID_WICPixelFormat40bppCMYKAlpha would convert to GUID_WICPixelFormat32bppRGBA and that GUID_WICPixelFormat80bppCMYKAlpha would convert to GUID_WICPixelFormat64bppRGBA, because the black channel (k) values would get readded / "swallowed" into into the CMY channels
In the second case I would think that:
GUID_WICPixelFormat40bppCMYKAlpha would convert to GUID_WICPixelFormat64bppRGBA and that GUID_WICPixelFormat80bppCMYKAlpha would convert to GUID_WICPixelFormat128bppRGBA, because the black channel (k) bits would get redistributed amongst the remaining 4 channels (CYMA) and those "new bits" added to those channels would fit in the GUID_WICPixelFormat64bppRGBA and GUID_WICPixelFormat128bppRGBA formats. But also seeing as there is no GUID_WICPixelFormat128bppRGBA format this case is kind of null and void
I basically do not understand why Microsoft says GUID_WICPixelFormat40bppCMYKAlpha and GUID_WICPixelFormat80bppCMYKAlpha should convert to GUID_WICPixelFormat64bppRGBA in the end

• Hi, New here.
I need some help. My fiance and I like to play this mobile game online that goes by real time. Her and I are always working but when we have free time we like to play this game. We don't always got time throughout the day to Queue Buildings, troops, Upgrades....etc....
I was told to look into DLL Injection and OpenGL/DirectX Hooking. Is this true? Is this what I need to learn?
How do I read the Android files, or modify the files, or get the in-game tags/variables for the game I want?
Any assistance on this would be most appreciated. I been everywhere and seems no one knows or is to lazy to help me out. It would be nice to have assistance for once. I don't know what I need to learn.
So links of topics I need to learn within the comment section would be SOOOOO.....Helpful. Anything to just get me started.
Thanks,
Dejay Hextrix

• In some situations, my game starts to "lag" on older computers. I wanted to search for bottlenecks and optimize my game by searching for flaws in the shaders and in the layer between CPU and GPU. My first step was to measure the time my render function needs to solve its tasks. Every second I wrote the accumulated times of each task into my console window. Each second it takes around
170ms to call render functions for all models (including settings shader resources, updating constant buffers, drawing all indexed and non-indexed vertices, etc.) 40ms to render the UI 790ms to call SwapChain.Present <1ms to do the rest (updating structures, etc.) In my Swap Chain description I set a frame rate of 60 Hz, if it's supported by the computer. It made sense for me that the Present function waits some time until it starts the next frame. However, I wanted to check, if this might be a problem for me. After a web search I found articles like this one, which states
My drivers are up-to-date so that's no issue. I installed Microsoft's PIX, but I was unable to use it. I could configure my game for x64, but PIX is not able to process DirectX 11.. After getting only error messages, I installed NVIDIA's NSight. After adjusting my game and installing all components, I couldn't get a proper result, because my game freezes after a few frames. I haven't figured out why. There is no exception or error message and other debug mechanisms like log messages and break points tell me the game freezes at the end of the render function after a few frames. So, I looked for another profiling tool and found Jeremy's GPUProfiler. However, the information returned by this tool is too basic to get an in-depth knowledge about my performance issues.
Can anyone recommend a GPU Profiler or any other tool that might help me to find bottlenecks in my game and or that is able to indicate performance problems in my shaders? My custom graphics engine can handle subjects like multi-texturing, instancing, soft shadowing, animation, etc. However, I am pretty sure, there are things I can optimize!
I am using SharpDX to develop a game (engine) based on DirectX 11 with .NET Framework 4.5. My graphics cards is from NVIDIA and my processor is made by Intel.

• 10
• 9
• 24
• 11
• 9