Sign in to follow this  

DX11 DX11 slow on integrated graphics

Recommended Posts

Hi, i'm writing a simple DX11 app and stumbled on a fact that it's running quite slow on integrated graphics with hardware future level of DX10. More precisely i'm getting only like 18fps on my office Intel [color=#000000][font=sans-serif][size=3]GMA X4500, but my desktop ATI HD6850 renders the same code at around 1900fps. I [/size][/font][/color]expected[color=#000000][font=sans-serif][size=3] code to run slower, but not like 100 times. Here's what I got from PIX:[/size][/font][/color]


I can't imagine, that OMSetRenderTargets or Canstant buffer updates can take that much time, so it must be buffer clearings? guess timings are not accurate at all... What could help improve performance on this machine, could it be that slow down is caused by 32bit texture format?

Share this post

Link to post
Share on other sites
[quote name='nept' timestamp='1335293258' post='4934510']
I believe that is a DX10 card so you would be in software emulation if you call DX11.

Nope. If you use a feature level of D3D10 you get hardware acceleration of D3D10-level features - that's kind of the whole point of feature levels.

Share this post

Link to post
Share on other sites

There are several methods for tunning shaders.
Maybe you would reinvestigate your shaders.
For example, developers often are removing if statements if possible, etc. (for better threading)
Other method suggested by Intel is generating static and dynamic shadow maps into separate textures because of the frequency of changes.
Maybe your Intel can not optimize your code.
The solution would be a research for these quick algorithms.
Now Intel GPUs are not too fast [img][/img], but the future would be better.

Share this post

Link to post
Share on other sites
Well what do you expect, it's an integrated card. It's not supposed to be fast. That said, 18 fps does seem a bit slow. The timings do make sense because GPU's are asynchronous devices, so when you tell them to Draw() you actually tell them to "Draw() as soon as you can", which is instantaneous, and then later on when you call a constant buffer update or a render target change you are then forced to wait on the GPU to finish rendering since you can't update resources which are being used. This is why those calls take forever.

Try doing it with a very simple test case and see if you get the same timings/results. It could just be that your integrated graphics card isn't very optimized for DX11-reduced/DX10 (it may very well have been slapped on it as an afterthought).

Share this post

Link to post
Share on other sites
I see that you're doing a lot of render target setting and clearing here as well. This is not going to play well with integrated graphics, which are quite weak in terms of fillrate.

Depending on what you're doing you may be able to get away without clearing the render targets. If you're drawing over the full extents of the target, for example, you really don't need to clear as everything is going to be covered anyway - that should get you back a few frames.

For your final draw, do you even need a depth/stencil view? All that you're doing is blasting the end results of your render to the screen, so you may be able to drop the depth/stencil, and disable depth test/depth write for this part of the draw.

Also very important to consider is that if you're clearing depth you should also clear stencil at the same time - even if you're not using it. This is because depth and stencil are often interleaved with 24 bits for depth and 8 for stencil (it's not clear from your shot if you have this format) so clearing both together can get you a MUCH faster clear.

Finally, those R32G32 textures are not going to perform well at all on this kind of hardware. Consider a simpler format - do you really need all that precision?

Share this post

Link to post
Share on other sites
Thanks for the tips. I will try to optimize shaders later, yesterday was just to tired. Here's somethings I tried and results:
changed texture format to R16G16: +0 fps
removed three useless render target clearings: +1 fps
added clear depth flag to last depth/stencil clear: +7 fps
Think i can't remove last depth test, because I would need to implement some sort of geometry sorting by depth. My scene: render variance shadow map to R16G16, perform gaussian blur on X axis, then gaussian on Y axis, then render whole scene. I will try to get more accurate timings with flush() command, maybe then I can track the culprit, or just certify that this GPU can't render squat.

Share this post

Link to post
Share on other sites
I suspect you're shader bound somewhere if changing the texture format didn't help at all. Try switching each pixel shader in turn with one that returns a constant colour. Note which changes have the biggest effect on FPS.

Are you using the [url=""]bilinear filter to optimize the gaussian blur[/url]?

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Partner Spotlight

  • Forum Statistics

    • Total Topics
    • Total Posts
  • Similar Content

    • By evelyn4you
      i have read very much about the binding of a constantbuffer to a shader but something is still unclear to me.
      e.g. when performing :   vertexshader.setConstantbuffer ( buffer,  slot )
       is the buffer bound
      a.  to the VertexShaderStage
      b. to the VertexShader that is currently set as the active VertexShader
      Is it possible to bind a constantBuffer to a VertexShader e.g. VS_A and keep this binding even after the active VertexShader has changed ?
      I mean i want to bind constantbuffer_A  to VS_A, an Constantbuffer_B to VS_B  and  only use updateSubresource without using setConstantBuffer command every time.

      Look at this example:
      SetVertexShader ( VS_A )
      vertexshader.setConstantbuffer ( buffer_A,  slot_A )
      perform drawcall       ( buffer_A is used )

      SetVertexShader ( VS_B )
      vertexshader.setConstantbuffer ( buffer_B,  slot_A )
      perform drawcall   ( buffer_B is used )
      SetVertexShader ( VS_A )
      perform drawcall   (now which buffer is used ??? )
      I ask this question because i have made a custom render engine an want to optimize to
      the minimum  updateSubresource, and setConstantbuffer  calls
    • By noodleBowl
      I got a quick question about buffers when it comes to DirectX 11. If I bind a buffer using a command like:
      IASetVertexBuffers IASetIndexBuffer VSSetConstantBuffers PSSetConstantBuffers  and then later on I update that bound buffer's data using commands like Map/Unmap or any of the other update commands.
      Do I need to rebind the buffer again in order for my update to take effect? If I dont rebind is that really bad as in I get a performance hit? My thought process behind this is that if the buffer is already bound why do I need to rebind it? I'm using that same buffer it is just different data
    • By Rockmover
      I am really stuck with something that should be very simple in DirectX 11. 
      1. I can draw lines using a PC (position, colored) vertices and a simple shader just fine.
      2. I can draw 3D triangles using PCN (position, colored, normal) vertices just fine (even transparency and SpecularBlinnPhong shaders).
      However, if I'm using my 3D shader, and I want to draw my PC lines in the same scene how can I do that?
      If I change my lines to PCN and pass them to the 3D shader with my triangles, then the lighting screws them all up.  I only want the lighting for the 3D triangles, but no SpecularBlinnPhong/Lighting for the lines (just PC). 
      I am sure this is because if I change the lines to PNC there is not really a correct "normal" for the lines.  
      I assume I somehow need to draw the 3D triangles using one shader, and then "switch" to another shader and draw the lines?  But I have no clue how to use two different shaders in the same scene.  And then are the lines just drawn on top of the triangles, or vice versa (maybe draw order dependent)?  
      I must be missing something really basic, so if anyone can just point me in the right direction (or link to an example showing the implementation of multiple shaders) that would be REALLY appreciated.
      I'm also more than happy to post my simple test code if that helps as well!
    • By Reitano
      I am writing a linear allocator of per-frame constants using the DirectX 11.1 API. My plan is to replace the traditional constant allocation strategy, where most of the work is done by the driver behind my back, with a manual one inspired by the DirectX 12 and Vulkan APIs.
      In brief, the allocator maintains a list of 64K pages, each page owns a constant buffer managed as a ring buffer. Each page has a history of the N previous frames. At the beginning of a new frame, the allocator retires the frames that have been processed by the GPU and frees up the corresponding space in each page. I use DirectX 11 queries for detecting when a frame is complete and the ID3D11DeviceContext1::VS/PSSetConstantBuffers1 methods for binding constant buffers with an offset.
      The new allocator appears to be working but I am not 100% confident it is actually correct. In particular:
      1) it relies on queries which I am not too familiar with. Are they 100% reliable ?
      2) it maps/unmaps the constant buffer of each page at the beginning of a new frame and then writes the mapped memory as the frame is built. In pseudo code:
 = device.Map(page.buffer)
          Alloc(size, initData)
              memcpy( + page.start, initData, size)
          Alloc(size, initData)
              memcpy( + page.start, initData, size)
      (Note: calling Unmap at the end of a frame prevents binding the mapped constant buffers and triggers an error in the debug layer)
      Is this valid ? 
      3) I don't fully understand how many frames I should keep in the history. My intuition says it should be equal to the maximum latency reported by IDXGIDevice1::GetMaximumFrameLatency, which is 3 on my machine. But, this value works fine in an unit test while on a more complex demo I need to manually set it to 5, otherwise the allocator starts overwriting previous frames that have not completed yet. Shouldn't the swap chain Present method block the CPU in this case ?
      4) Should I expect this approach to be more efficient than the one managed by the driver ? I don't have meaningful profile data yet.
      Is anybody familiar with the approach described above and can answer my questions and discuss the pros and cons of this technique based on his experience ? 
      For reference, I've uploaded the (WIP) allocator code at  Feel free to adapt it in your engine and please let me know if you spot any mistakes
      Stefano Lanza
    • By Matt Barr
      Hey all. I've been working with compute shaders lately, and was hoping to build out some libraries to reuse code. As a prerequisite for my current project, I needed to sort a big array of data in my compute shader, so I was going to implement quicksort as a library function. My implementation was going to use an inout array to apply the changes to the referenced array.

      I spent half the day yesterday debugging in visual studio before I realized that the solution, while it worked INSIDE the function, reverted to the original state after returning from the function.

      My hack fix was just to inline the code, but this is not a great solution for the future.  Any ideas? I've considered just returning an array of ints that represents the sorted indices.
  • Popular Now