Sign in to follow this  

DX11 DX11 Performance

Recommended Posts

Hey guys, I've recently converted my DirectX10 Asteroids clone to DirectX11. At first everything was fine, but then I progressed through the game, and found some severe lag when a few more asteroids were on screen. I did some shoddy bench-marking and found that DX11 is performing at about 25% of DX10. I create a hardware DX10 device. I'm using the Aug09 SDK. Visual Studio 2008 - Native C++. ATI Radeon HD4850 - latest drivers. I was just wondering if any of you have the same issues or any ideas as to why it would be so slow... Cheers

Share this post

Link to post
Share on other sites
Pretty much that ^
There is no reason why the DX11 stuff should be slower than DX10 (the only issue I've found with the drivers in the past was that the tessellator was spitting out far too many triangles, but that has been fixed).

Keep in mind that in order to use PerfStudio sanely you will need your app to support a return of 0 from timeGetTime() or the performance timer values so that you can freeze the game as PS basically takes over those functions; it can also set to return very small values which will help in cases where your logic doesn't allow for it.
(I've found that stuff built on top of DXUT doesn't play well with it at all as it doesn't seem to use any timer calls to manage framerate etc)

Share this post

Link to post
Share on other sites
Thanks for the posts guys, this app is cool.
I found that the GPU is actually performing better with DX11. And the CPU was the issue.
Since the game logic hasn't changed between projects I guessed that the c++ optimization flag was responsible. I had to disable it so the ID3D11Device creation wouldn't fail. Just humour myself I tried turning it back on... and surprisingly the device creation did not fail (after failing for a long painful week), and now it runs flawlessly.
Thanks again.

Share this post

Link to post
Share on other sites
Actually, my engine performs MUCH better with the Direct3D 11 renderer with D3D9 and D3D10 feature sets, rather than the original API's.

Share this post

Link to post
Share on other sites
Out of curiosity, did you get those numbers through launching the app (even in release mode) from Visual Studio? I noticed something similar way back when but did some experimentation and found that if I ran my program *outside* the IDE I jump back up into the normal 1200fps level for just the clear. Could be useful to know.

Share this post

Link to post
Share on other sites
I had issues like that ages ago, with my DirectX10 project, that turned out to be corruption of the project settings or something, I copied my code into a new project and it was fixed.

But this issue was definitely the maximise speed property being turned off.
I'd like to know why it didn't work with the property a few days ago, but does now... I can't remember any changes other than a driver update and I doubt that would be involved...

Share this post

Link to post
Share on other sites

I just ported my DX9 renderer straight to DX11 (bypassed the DX10 API entirely). I am using D3D10 feature level right now.

I found that for very simple scenes (i.e. drawing a single object, or even just a single clear) my DX11 framerate is "much" slower than the DX9 counterpart.

After doing some poking around, I found this is also the case for the DX sample apps. The "EmptyProject" sample in the SDK samples runs at > 9000 FPS for both DX9 and DX10 on my cards, but only around 1500 FPS for DX11 (with vsync disabled of course).

Anyone else notice this?

*EDIT* - I just updated my nvidia drivers (they were a month old) and "EmptyProject11" is way up to ~4500 fps now. Still a large gap between this and the DX 10 version (EmptyProject10 runs at about 7000 fps), but at least an improvement.

[Edited by - Krighton on February 22, 2010 11:35:50 PM]

Share this post

Link to post
Share on other sites
When the framerate is that high any measurements you take are going to be essentially meaningless. You'll need to apply at least some sort of load to start getting measurements.

Also in D3D10 + D3D11 the device defaults to multithreaded usage, which causes you to enter a coarse lock every time you call an API function and adds a bit of overhead.. This is off by default in D3D9.

Share this post

Link to post
Share on other sites
Original post by MJP
When the framerate is that high any measurements you take are going to be essentially meaningless. You'll need to apply at least some sort of load to start getting measurements.

Yeah, I figured I'd get this reply.

I agree that measurements this high are *largely* meaningless. But when the measurements I am getting are very consistent across a number of different apps (and the fact that updated drivers significantly increased my measurements), there has to be some merit to them. I find the speed of the raw pipeline (before I start pushing any real load) to be something worth considering. When my measurements are 1500 fps vs 9000 fps (DX10 vs DX11), consistently... there is something going on worth investigating.

I also noticed a fairly significant performance drop with the renderer under load (this was one of the first things I tested after noticing the slower "simple" case). However, it's entirely possibly that the performance diff I noticed under load was partly due to naive render state management or poor use of the Effects11 lib.

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Partner Spotlight

  • Forum Statistics

    • Total Topics
    • Total Posts
  • Similar Content

    • By evelyn4you
      i have read very much about the binding of a constantbuffer to a shader but something is still unclear to me.
      e.g. when performing :   vertexshader.setConstantbuffer ( buffer,  slot )
       is the buffer bound
      a.  to the VertexShaderStage
      b. to the VertexShader that is currently set as the active VertexShader
      Is it possible to bind a constantBuffer to a VertexShader e.g. VS_A and keep this binding even after the active VertexShader has changed ?
      I mean i want to bind constantbuffer_A  to VS_A, an Constantbuffer_B to VS_B  and  only use updateSubresource without using setConstantBuffer command every time.

      Look at this example:
      SetVertexShader ( VS_A )
      vertexshader.setConstantbuffer ( buffer_A,  slot_A )
      perform drawcall       ( buffer_A is used )

      SetVertexShader ( VS_B )
      vertexshader.setConstantbuffer ( buffer_B,  slot_A )
      perform drawcall   ( buffer_B is used )
      SetVertexShader ( VS_A )
      perform drawcall   (now which buffer is used ??? )
      I ask this question because i have made a custom render engine an want to optimize to
      the minimum  updateSubresource, and setConstantbuffer  calls
    • By noodleBowl
      I got a quick question about buffers when it comes to DirectX 11. If I bind a buffer using a command like:
      IASetVertexBuffers IASetIndexBuffer VSSetConstantBuffers PSSetConstantBuffers  and then later on I update that bound buffer's data using commands like Map/Unmap or any of the other update commands.
      Do I need to rebind the buffer again in order for my update to take effect? If I dont rebind is that really bad as in I get a performance hit? My thought process behind this is that if the buffer is already bound why do I need to rebind it? I'm using that same buffer it is just different data
    • By Rockmover
      I am really stuck with something that should be very simple in DirectX 11. 
      1. I can draw lines using a PC (position, colored) vertices and a simple shader just fine.
      2. I can draw 3D triangles using PCN (position, colored, normal) vertices just fine (even transparency and SpecularBlinnPhong shaders).
      However, if I'm using my 3D shader, and I want to draw my PC lines in the same scene how can I do that?
      If I change my lines to PCN and pass them to the 3D shader with my triangles, then the lighting screws them all up.  I only want the lighting for the 3D triangles, but no SpecularBlinnPhong/Lighting for the lines (just PC). 
      I am sure this is because if I change the lines to PNC there is not really a correct "normal" for the lines.  
      I assume I somehow need to draw the 3D triangles using one shader, and then "switch" to another shader and draw the lines?  But I have no clue how to use two different shaders in the same scene.  And then are the lines just drawn on top of the triangles, or vice versa (maybe draw order dependent)?  
      I must be missing something really basic, so if anyone can just point me in the right direction (or link to an example showing the implementation of multiple shaders) that would be REALLY appreciated.
      I'm also more than happy to post my simple test code if that helps as well!
    • By Reitano
      I am writing a linear allocator of per-frame constants using the DirectX 11.1 API. My plan is to replace the traditional constant allocation strategy, where most of the work is done by the driver behind my back, with a manual one inspired by the DirectX 12 and Vulkan APIs.
      In brief, the allocator maintains a list of 64K pages, each page owns a constant buffer managed as a ring buffer. Each page has a history of the N previous frames. At the beginning of a new frame, the allocator retires the frames that have been processed by the GPU and frees up the corresponding space in each page. I use DirectX 11 queries for detecting when a frame is complete and the ID3D11DeviceContext1::VS/PSSetConstantBuffers1 methods for binding constant buffers with an offset.
      The new allocator appears to be working but I am not 100% confident it is actually correct. In particular:
      1) it relies on queries which I am not too familiar with. Are they 100% reliable ?
      2) it maps/unmaps the constant buffer of each page at the beginning of a new frame and then writes the mapped memory as the frame is built. In pseudo code:
 = device.Map(page.buffer)
          Alloc(size, initData)
              memcpy( + page.start, initData, size)
          Alloc(size, initData)
              memcpy( + page.start, initData, size)
      (Note: calling Unmap at the end of a frame prevents binding the mapped constant buffers and triggers an error in the debug layer)
      Is this valid ? 
      3) I don't fully understand how many frames I should keep in the history. My intuition says it should be equal to the maximum latency reported by IDXGIDevice1::GetMaximumFrameLatency, which is 3 on my machine. But, this value works fine in an unit test while on a more complex demo I need to manually set it to 5, otherwise the allocator starts overwriting previous frames that have not completed yet. Shouldn't the swap chain Present method block the CPU in this case ?
      4) Should I expect this approach to be more efficient than the one managed by the driver ? I don't have meaningful profile data yet.
      Is anybody familiar with the approach described above and can answer my questions and discuss the pros and cons of this technique based on his experience ? 
      For reference, I've uploaded the (WIP) allocator code at  Feel free to adapt it in your engine and please let me know if you spot any mistakes
      Stefano Lanza
    • By Matt Barr
      Hey all. I've been working with compute shaders lately, and was hoping to build out some libraries to reuse code. As a prerequisite for my current project, I needed to sort a big array of data in my compute shader, so I was going to implement quicksort as a library function. My implementation was going to use an inout array to apply the changes to the referenced array.

      I spent half the day yesterday debugging in visual studio before I realized that the solution, while it worked INSIDE the function, reverted to the original state after returning from the function.

      My hack fix was just to inline the code, but this is not a great solution for the future.  Any ideas? I've considered just returning an array of ints that represents the sorted indices.
  • Popular Now