# DX11 Performance Issue DirectX Maximized Window

## Recommended Posts

 I develop a test application with directx 11 und fl 10.1. Everything is working as expected and fine, but when I maximize the window with my graphics in it, the time per frame increases drastically. like 1ms to 40ms. If it stays at a lower resolution range, it works perfectly. But i need to support it in the maximized resolution also. Destination hardware and software specs: NVS 300 graphics card Windows 7 32-bit resolution 1920x1080 Application that draws few sinuses with direct3d, c# via sharpdx Windows forms with a control and sharpdx initialized swapchain, programmed to change backbuffer on resize event (would occur without that too though) I used a System.Stopwatch to find the issue at the code line: mSwapChain.Present(1, PresentFlags.None); where the time it needs when maximized increases by a lot suddenly. If i drag and resize it manually at some resolution the frame time jumps, seems weird. If i comment out the drawing code, I get the same behavior. On my local development machine with a HD4400 i don't have this issue though, there it works and the frame time isn't affected by resizing at all. Any help is appreciated ! I am farily new to programming in c#, windows and with DirectX also, so please be kind

I can buy an NVS 300 for US$6 now and is an architecture from 2008 - not exactly a powerhouse First instinct is to simply blame that hardware. What resolution in windowed mode runs at 1ms, compared to the 40ms of 1080p? Is there any in-between performance levels -- e.g. a certain resolution where it takes 10ms per frame? #### Share this post ##### Link to post ##### Share on other sites 40 minutes ago, Hodgman said: I can buy an NVS 300 for US$6 now and is an architecture from 2008 - not exactly a powerhouse  First instinct is to simply blame that hardware.

What resolution in windowed mode runs at 1ms, compared to the 40ms of 1080p? Is there any in-between performance levels -- e.g. a certain resolution where it takes 10ms per frame?

Haha yes it's pretty old hardware now, it is mandatory to support it though sadly

Okay I had present sync interval set to 1 so it's more like a jump from ~16ms to 40ms, but you know.

If i set it to sync interval 0 and change the creation of the graph window to like 1600x900 i get all my expected 2-3ms if not lower, even on that old hardware. I mean i can completely comment out my drawing and still, if i drag it to a certain resolution (from 1600x900 close to the resolution of maximum) it jumps from that 2-3ms to ~40ms. Staying in that resolution range isn't an option either though. There isn't any level in that, certain resolution and immidiate increase. It is not fullscreen and there is also a control panel on one side, so even in maximized it wont reach 1920x1080.

It's confusing me a lot. I read about present() can block if frames are queued, i know to little about that though to say if that is the issue, it is called in an idle loop, so that might hint to that though. But again, can the gpu performance suffer so much at a tiny resolution increase, if i drag it higher.

thanks for help

##### Share on other sites

Yeah it's certainly odd. That card is supposed to support high resolution multi-monitor output and hundreds of megs of texture data, so a single ~8MB texture (1920*1080*4B) should be fine, so your "no drawing" situation should have no reason for this performance issue...

I have seen random ~40ms stalls on the GPU side when video memory is over-comitted (e.g. trying to use 1.1GB of texture data on a 1GB GPU), as this causes the driver to panic half way through a frame and shuffle lots of resources around... but... if you're not actually drawing anything, this shouldn't be happening to you.

Present is where the CPU will block if it's getting too far ahead of the GPU. e.g. if the CPU does 1ms of work per frame and the GPU is doing 16ms of work per frame, then you should expect to see the CPU block inside Present for 15ms.

It's a bit complicated, but you can use the GPUView tool to inspect exactly what the GPU is doing at every point in time here and try to identify the stall.

##### Share on other sites

Okay i'm trying to installing it now. Hopefully it can help me, thanks. I will report back if i have trouble or find something issue with it.

thanks so far

##### Share on other sites

Solved by disabling Aero Theme... Noticed the performance suffers a lot if the window gets over the start button. Switched to windows classic theme for test and worked. Still learned something with this tool you showed me at least, so thanks day saved

## Create an account

Register a new account

• ### Forum Statistics

• Total Topics
628395
• Total Posts
2982432
• ### Similar Content

• By KarimIO
Hey guys,
I'm trying to work on adding transparent objects to my deferred-rendered scene. The only issue is the z-buffer. As far as I know, the standard way to handle this is copying the buffer. In OpenGL, I can just blit it. What's the alternative for DirectX? And are there any alternatives to copying the buffer?
• By joeblack
Hi,
im reading about specular aliasing because of mip maps, as far as i understood it, you need to compute fetched normal lenght and detect now its changed from unit length. I’m currently using BC5 normal maps, so i reconstruct z in shader and therefore my normals are normalized. Can i still somehow use antialiasing or its not needed? Thanks.
• By 51mon
I want to change the sampling behaviour to SampleLevel(coord, ddx(coord.y).xx, ddy(coord.y).xx). I was just wondering if it's possible without explicit shader code, e.g. with some flags or so?

• Hello,
I want to improve the performance of my game (engine) and some of your helped me to make a GPU Profiler. After creating the GPU Profiler, I started to measure the time my GPU needs per frame. I refined my GPU time measurements to find my bottleneck.
Searching the bottleneck
Rendering a small scene in an Idle state takes around 15.38 ms per frame. 13.54 ms (88.04%) are spent while rendering the scene, 1.57 ms (10.22%) are spent during the SwapChain.Present call (no VSync!) and the rest is spent on other tasks like rendering the UI. I further investigated the scene rendering, since it takes über 88% of my GPU frame rendering time.
When rendering my scene, most of the time (80.97%) is spent rendering my models. The rest is spent to render the background/skybox, updating animation data, updating pixel shader constant buffer, etc. It wasn't really suprising that most of the time is spent for my models, so I further refined my measurements to find the actual bottleneck.
In my example scene, I have five animated NPCs. When rendering these NPCs, most actions are almost for free. Setting the proper shaders in the input layout (0.11%), updating vertex shader constant buffers (0.32%), setting textures (0.24%) and setting vertex and index buffers (0.28%). However, the rest of the GPU time (99.05% !!) is spent in two function calls: DrawIndexed and DrawIndexedInstance.
I searched this forum and the web for other articles and threads about these functions, but I haven't found a lot of useful information. I use SharpDX and .NET Framework 4.5 to develop my game (engine). The developer of SharpDX said, that "The method DrawIndexed in SharpDX is a direct call to DirectX" (Source). DirectX 11 is widely used and SharpDX is "only" a wrapper for DirectX functions, I assume the problem is in my code.
How I render my scene
When rendering my scene, I render one model after another. Each model has one or more parts and one or more positions. For example, a human model has parts like head, hands, legs, torso, etc. and may be placed in different locations (on the couch, on a street, ...). For static elements like furniture, houses, etc. I use instancing, because the positions never change at run-time. Dynamic models like humans and monster don't use instancing, because positions change over time.
When rendering a model, I use this work-flow:
Set vertex and pixel shaders, if they need to be updated (e.g. PBR shaders, simple shader, depth info shaders, ...) Set animation data as constant buffer in the vertex shader, if the model is animated Set generic vertex shader constant buffer (world matrix, etc.) Render all parts of the model. For each part: Set diffuse, normal, specular and emissive texture shader views Set vertex buffer Set index buffer Call DrawIndexedInstanced for instanced models and DrawIndexed models What's the problem
After my GPU profiling, I know that over 99% of the rendering time for a single model is spent in the DrawIndexedInstanced and DrawIndexed function calls. But why do they take so long? Do I have to try to optimize my vertex or pixel shaders? I do not use other types of shaders at the moment. "Le Comte du Merde-fou" suggested in this post to merge regions of vertices to larger vertex buffers to reduce the number of Draw calls. While this makes sense to me, it does not explain why rendering my five (!) animated models takes that much GPU time. To make sure I don't analyse something I wrong, I made sure to not use the D3D11_CREATE_DEVICE_DEBUG flag and to run as Release version in Visual Studio as suggested by Hodgman in this forum thread.
My engine does its job. Multi-texturing, animation, soft shadowing, instancing, etc. are all implemented, but I need to reduce the GPU load for performance reasons. Each frame takes less than 3ms CPU time by the way. So the problem is on the GPU side, I believe.

• I was wondering if someone could explain this to me
I'm working on using the windows WIC apis to load in textures for DirectX 11. I see that sometimes the WIC Pixel Formats do not directly match a DXGI Format that is used in DirectX. I see that in cases like this the original WIC Pixel Format is converted into a WIC Pixel Format that does directly match a DXGI Format. And doing this conversion is easy, but I do not understand the reason behind 2 of the WIC Pixel Formats that are converted based on Microsoft's guide
I was wondering if someone could tell me why Microsoft's guide on this topic says that GUID_WICPixelFormat40bppCMYKAlpha should be converted into GUID_WICPixelFormat64bppRGBA and why GUID_WICPixelFormat80bppCMYKAlpha should be converted into GUID_WICPixelFormat64bppRGBA
In one case I would think that:
GUID_WICPixelFormat40bppCMYKAlpha would convert to GUID_WICPixelFormat32bppRGBA and that GUID_WICPixelFormat80bppCMYKAlpha would convert to GUID_WICPixelFormat64bppRGBA, because the black channel (k) values would get readded / "swallowed" into into the CMY channels
In the second case I would think that:
GUID_WICPixelFormat40bppCMYKAlpha would convert to GUID_WICPixelFormat64bppRGBA and that GUID_WICPixelFormat80bppCMYKAlpha would convert to GUID_WICPixelFormat128bppRGBA, because the black channel (k) bits would get redistributed amongst the remaining 4 channels (CYMA) and those "new bits" added to those channels would fit in the GUID_WICPixelFormat64bppRGBA and GUID_WICPixelFormat128bppRGBA formats. But also seeing as there is no GUID_WICPixelFormat128bppRGBA format this case is kind of null and void
I basically do not understand why Microsoft says GUID_WICPixelFormat40bppCMYKAlpha and GUID_WICPixelFormat80bppCMYKAlpha should convert to GUID_WICPixelFormat64bppRGBA in the end

• 10
• 9
• 19
• 24
• 9