# DX11 dx11 drawIndexed cost overhead time

## Recommended Posts

The scene is a box draw with  a simple shader.

In a demo the drawIndexed cost 0.000160 ms, but in my engine cost 0.1 ms.

They use same mesh, same input layout, same shader, same renderstate .

Why drawIndexed cost  so much?  I 'm not sure the if drawIndexed return immediate or not ?

I 'm suspect that the quicker one use a defered context and the slower one use async context, they are in c++ project.

However they use the same code to create context :

UINT createDeviceFlags = 0;
#if defined(DEBUG) || defined(_DEBUG)
createDeviceFlags |= D3D11_CREATE_DEVICE_DEBUG;
#endif

D3D_FEATURE_LEVEL featureLevel;
HRESULT hr = D3D11CreateDevice(
_driverType,
0,                 // no software device
createDeviceFlags,
0, 0,              // default feature level array
D3D11_SDK_VERSION,
&_device,
&featureLevel,
&_context);

Have you ever meet this situation ? one draw call cost a great num of time.
Would wrong state will cause drawIndexed slower?
Edited by poigwym

##### Share on other sites
Yes, all D3D drawing functions are async, in that they just push commands into a queue, and the GPU consumes that queue at a (much) later point in the future.

When you run your game with D3D11_CREATE_DEVICE_DEBUG, do any warnings or errors get printed to the visual studio Output window?
BTW, instead of using an #ifdef for this, I find that it's very useful to allow a command line argument to determine whether this flag will be used, so that it's possible to enable D3D debugging in a release build when required.

To make sure that you're not generating any warnings or errors, run this code after you create the device, which tells Visual Studio to trigger a breakpoint when any D3D warnings/errors occur:
ID3D11InfoQueue* m_debugInfoQueue = 0;
m_device->QueryInterface(IID_ID3D11InfoQueue, (void**)&m_debugInfoQueue);
if (m_debugInfoQueue)
{
m_debugInfoQueue->SetBreakOnSeverity( D3D11_MESSAGE_SEVERITY_CORRUPTION, TRUE );
m_debugInfoQueue->SetBreakOnSeverity( D3D11_MESSAGE_SEVERITY_ERROR, TRUE );
m_debugInfoQueue->SetBreakOnSeverity( D3D11_MESSAGE_SEVERITY_WARNING, TRUE );
}
Lastly, are you testing a debug build or a release build? :) Edited by Hodgman

##### Share on other sites
Lastly, are you testing a debug build or a release build? :)

I  have not make any product , so all my program are DEBUG version.

How to paste code like yours ? and how to make this thing -> :) ?

I shut down all warnings.. I run you code and crash.

Edited by poigwym

##### Share on other sites

The biggest difference I think is my engine open a c++ console for printing message.

Edited by poigwym

##### Share on other sites

I  have not make any product , so all my program are DEBUG version.

Debug builds will always be extremely slow - never use them for testing performance.
Specifying the D3D11_CREATE_DEVICE_DEBUG flag when creating a D3D device will ruin performance too.

How to paste code like yours ?

[code]blah[/code]

I run you code and crash.

Where and what kind?

##### Share on other sites

I  have not make any product , so all my program are DEBUG version.

Debug builds will always be extremely slow - never use them for testing performance.
Specifying the D3D11_CREATE_DEVICE_DEBUG flag when creating a D3D device will ruin performance too.




You are right !! I shut down the D3D11_CREATE_DEVICE_DEBUG  and can easily draw thousands of boxes over 100 + fps.

But if I switch to RELEASE version, how can I debug ?  Those debug info will be dumped.  I 'm coding on visual stdio 2013.

##### Share on other sites

You should use visual studio's debug builds when you want to use breakpoints to step through your code line by line, and otherwise use release.

You should use the D3D11_CREATE_DEVICE_DEBUG flag to check for errors in your usage of the D3D API, and otherwise disable that flag.

i.e. switch back and forth between these 4 different modes depending on your current task.

## Create an account

Register a new account

• ## Partner Spotlight

• ### Forum Statistics

• Total Topics
627682
• Total Posts
2978614
• ### Similar Content

• hi,
i have read very much about the binding of a constantbuffer to a shader but something is still unclear to me.
e.g. when performing :   vertexshader.setConstantbuffer ( buffer,  slot )
is the buffer bound
or
b. to the VertexShader that is currently set as the active VertexShader
Is it possible to bind a constantBuffer to a VertexShader e.g. VS_A and keep this binding even after the active VertexShader has changed ?
I mean i want to bind constantbuffer_A  to VS_A, an Constantbuffer_B to VS_B  and  only use updateSubresource without using setConstantBuffer command every time.

Look at this example:
perform drawcall       ( buffer_A is used )

perform drawcall   ( buffer_B is used )
perform drawcall   (now which buffer is used ??? )

I ask this question because i have made a custom render engine an want to optimize to
the minimum  updateSubresource, and setConstantbuffer  calls

• I got a quick question about buffers when it comes to DirectX 11. If I bind a buffer using a command like:
IASetVertexBuffers IASetIndexBuffer VSSetConstantBuffers PSSetConstantBuffers  and then later on I update that bound buffer's data using commands like Map/Unmap or any of the other update commands.
Do I need to rebind the buffer again in order for my update to take effect? If I dont rebind is that really bad as in I get a performance hit? My thought process behind this is that if the buffer is already bound why do I need to rebind it? I'm using that same buffer it is just different data

• I am really stuck with something that should be very simple in DirectX 11.
1. I can draw lines using a PC (position, colored) vertices and a simple shader just fine.
2. I can draw 3D triangles using PCN (position, colored, normal) vertices just fine (even transparency and SpecularBlinnPhong shaders).

However, if I'm using my 3D shader, and I want to draw my PC lines in the same scene how can I do that?

If I change my lines to PCN and pass them to the 3D shader with my triangles, then the lighting screws them all up.  I only want the lighting for the 3D triangles, but no SpecularBlinnPhong/Lighting for the lines (just PC).
I am sure this is because if I change the lines to PNC there is not really a correct "normal" for the lines.
I assume I somehow need to draw the 3D triangles using one shader, and then "switch" to another shader and draw the lines?  But I have no clue how to use two different shaders in the same scene.  And then are the lines just drawn on top of the triangles, or vice versa (maybe draw order dependent)?
I must be missing something really basic, so if anyone can just point me in the right direction (or link to an example showing the implementation of multiple shaders) that would be REALLY appreciated.

I'm also more than happy to post my simple test code if that helps as well!

• By Reitano
Hi,
I am writing a linear allocator of per-frame constants using the DirectX 11.1 API. My plan is to replace the traditional constant allocation strategy, where most of the work is done by the driver behind my back, with a manual one inspired by the DirectX 12 and Vulkan APIs.
In brief, the allocator maintains a list of 64K pages, each page owns a constant buffer managed as a ring buffer. Each page has a history of the N previous frames. At the beginning of a new frame, the allocator retires the frames that have been processed by the GPU and frees up the corresponding space in each page. I use DirectX 11 queries for detecting when a frame is complete and the ID3D11DeviceContext1::VS/PSSetConstantBuffers1 methods for binding constant buffers with an offset.
The new allocator appears to be working but I am not 100% confident it is actually correct. In particular:
1) it relies on queries which I am not too familiar with. Are they 100% reliable ?
2) it maps/unmaps the constant buffer of each page at the beginning of a new frame and then writes the mapped memory as the frame is built. In pseudo code:
BeginFrame:
page.data = device.Map(page.buffer)
device.Unmap(page.buffer)
RenderFrame
Alloc(size, initData)
...
memcpy(page.data + page.start, initData, size)
Alloc(size, initData)
...
memcpy(page.data + page.start, initData, size)
(Note: calling Unmap at the end of a frame prevents binding the mapped constant buffers and triggers an error in the debug layer)
Is this valid ?
3) I don't fully understand how many frames I should keep in the history. My intuition says it should be equal to the maximum latency reported by IDXGIDevice1::GetMaximumFrameLatency, which is 3 on my machine. But, this value works fine in an unit test while on a more complex demo I need to manually set it to 5, otherwise the allocator starts overwriting previous frames that have not completed yet. Shouldn't the swap chain Present method block the CPU in this case ?
4) Should I expect this approach to be more efficient than the one managed by the driver ? I don't have meaningful profile data yet.
Is anybody familiar with the approach described above and can answer my questions and discuss the pros and cons of this technique based on his experience ?
For reference, I've uploaded the (WIP) allocator code at https://paste.ofcode.org/Bq98ujP6zaAuKyjv4X7HSv.  Feel free to adapt it in your engine and please let me know if you spot any mistakes
Thanks
Stefano Lanza

• Hey all. I've been working with compute shaders lately, and was hoping to build out some libraries to reuse code. As a prerequisite for my current project, I needed to sort a big array of data in my compute shader, so I was going to implement quicksort as a library function. My implementation was going to use an inout array to apply the changes to the referenced array.

I spent half the day yesterday debugging in visual studio before I realized that the solution, while it worked INSIDE the function, reverted to the original state after returning from the function.

My hack fix was just to inline the code, but this is not a great solution for the future.  Any ideas? I've considered just returning an array of ints that represents the sorted indices.

• 13
• 12
• 10
• 12
• 22