performance problem with my renderer

Started by
18 comments, last by 21st Century Moose 11 years, 11 months ago
A current-gen engine with Direct3D10+ should be able to handle at least 10k+ raw draw calls on a modern computer
With a raw draw calls I mean: pure #draw calls, without any optimisations such as instancing or culling.
EDIT: and I mean draw calls with a simple effect and vertexbuffer, like a textured rectangle.

My engine can only handle about 1k draw calls in order to sustain a smooth framerate. I have been profiling the hell out of my engine with intel vtune, amd codeanalyst, pix, and the default VS profiler, but I just can't seem to find the problem!

Another peculiar thing is that an ID3D10EffectPass::Apply() seems to take longer than most draw() calls.
After some tests, ID3D10EffectPass::Apply() doesn't do what msdn says: ([color=#2A2A2A]Set the state contained in a pass to the device.)

If I apply() before I commit my shader variables, my variables won't be updated. This implies that when a technique only contains 1 pass, we can not apply() per material but are forced to do this per mesh.

If anyone made a pretty performance-concerned rendering engine for PC with Direct3D10+, can you please check how many raw draw calls it can handle, and if the CPU spends more time doing Apply() than Draw()?

Can you apply per material instead of per mesh, when a technique only contains one pass? (not according to my tests, while a lot of people say that this would be an optimization I could make)

And does anyone have any idea why my engine would only do 1k draw calls? My algorithms are all tested for computational complexity etc, so it's probably not that!

Thanks x1000!
Advertisement

A current-gen engine with Direct3D10+ should be able to handle at least 10k+ raw draw calls on a modern computer

There was a nvidia(?) presentation a few years back which talked about the number of draw calls per second. Pure draw calls are CPU limited and they gave a formula depending on GHz of a single core. The limits were more or less 1k-1.5k for a 2.5GHz. Considering that the GHz of single CPUs hasn't increased terrible the last 5 years, I would sugguest, that 1k is more realistic than 10k.
Draw call overhead in D3D10+ is much more efficient than in previous versions, and can be cosidered more-or-less on a par with OpenGL, but it's still not free. However, a quick and dirty check shows that I can sustain ~12500 draw calls per-frame at ~250fps, which in turn shows that your performance woes are most likely coming from elsewhere.

I have no idea what's going on with EffectState::Apply - I personally don't use the effects framework at this level - but I'm guessing this is the most probable candidate. Solutions might include not using the effects framework (which is far easier than you may think at first) or moving your state handling from the framework to your program's code.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.


I have no idea what's going on with EffectState::Apply - I personally don't use the effects framework at this level - but I'm guessing this is the most probable candidate. Solutions might include not using the effects framework (which is far easier than you may think at first) or moving your state handling from the framework to your program's code.


Not using the effects framework? Then how do you handle texturing/lighting? In fact how do you render anything at all?
And moving state handling from the framework to my program, how would I do that?
Also, what Direct3D version are you using?

[quote name='mhagain' timestamp='1337276641' post='4940981']
I have no idea what's going on with EffectState::Apply - I personally don't use the effects framework at this level - but I'm guessing this is the most probable candidate. Solutions might include not using the effects framework (which is far easier than you may think at first) or moving your state handling from the framework to your program's code.


Not using the effects framework? Then how do you handle texturing/lighting? In fact how do you render anything at all?
And moving state handling from the framework to my program, how would I do that?
Also, what Direct3D version are you using?
[/quote]

You don't need the FX framework to do any of that.
You can set textures, render states and constants, and trigger draws on the device yourself in d3d10 / deviceContext in d3d11.

So if you read in the texture ids / state ids / constants from your material files and build up your own renderable blocks with the desired d3d resources, you can then manage it all yourself. This allows you to batch in maybe more efficient ways, remove redundent API calls, etc. which you might not be able to do through the FX framework (Last time I used the fx framework was 2007 or so, so my memory is a bit fuzzy).
In d3d11 you can also make use of multiple cores by building up your draw lists on different threads using the deferredDeviceContexts, which can help reduce the CPU load quite a bit (especially now the driver support for it seems to be pretty good, at least from NVs side).

As you mentioned in your original post, instancing can also give pretty good speedups.

Not using the effects framework? Then how do you handle texturing/lighting? In fact how do you render anything at all?
And moving state handling from the framework to my program, how would I do that?
Also, what Direct3D version are you using?


SamplerState sampler3 : register(s3);
Texture2D tex0 : register(t0);
Texture2D tex1 : register(t1);


Context->PSSetSamplers (3, ...);
Context->PSSetShaderResources (0, .....);
Context->PSSetShaderResources (1, .....);


This is D3D11 but this kind of thing worked even back in D3D9 HLSL. Just specify explicit registers and set resources to those registers - the effects framework is partially just a wrapper around all of this, but you definitely don't need that wrapper.

The main motivations for doing it this way are so that I can mix and match different vertex/geometry/pixel shaders without having to specify new passes in a .FX file, so that I can dynamically switch certain states in and out in program code, because I'm a mite uneasy with the way the framework handles constant buffers (may be unwarranted but it just feels wrong to me), and so that I can avoid other overheads associated with using the framework.

This way does need a little bit more work, but like I said, it's not that much, and the added flexibility and performance potential more than justifies it.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.


SamplerState sampler3 : register(s3);
Texture2D tex0 : register(t0);
Texture2D tex1 : register(t1);


Context->PSSetSamplers (3, ...);
Context->PSSetShaderResources (0, .....);
Context->PSSetShaderResources (1, .....);


This is D3D11 but this kind of thing worked even back in D3D9 HLSL. Just specify explicit registers and set resources to those registers - the effects framework is partially just a wrapper around all of this, but you definitely don't need that wrapper.

The main motivations for doing it this way are so that I can mix and match different vertex/geometry/pixel shaders without having to specify new passes in a .FX file, so that I can dynamically switch certain states in and out in program code, because I'm a mite uneasy with the way the framework handles constant buffers (may be unwarranted but it just feels wrong to me), and so that I can avoid other overheads associated with using the framework.

This way does need a little bit more work, but like I said, it's not that much, and the added flexibility and performance potential more than justifies it.



I could understand that this approach would update shader variables without having to call Apply().
This may give a certain(certainly worth it) performance boost in techniques that require only one pass.

However, I don't see how you would avoid using passes with it?
Also, maybe Direct3D11 has something to do with you getting such a high #draw calls per frame? (I'm using Direct3D10)
I don't believe that this approach would get me a 15x draw call performance boost (needed in order to get your #drawsperframe @ 250fps). I'm guessing max 3x.

Also, since someone else replied that instancing might give a performance boost, indeed it would. And multithreading too!
But please note that this is not a thread about the generic performance for a renderer: just a focus on draw call performance, not on actually lowering the #draw calls per frame.

Thanks btw, really helpful information.

I could understand that this approach would update shader variables without having to call Apply().
This may give a certain(certainly worth it) performance boost in techniques that require only one pass.

However, I don't see how you would avoid using passes with it?
Also, maybe Direct3D11 has something to do with you getting such a high #draw calls per frame? (I'm using Direct3D10)
I don't believe that this approach would get me a 15x draw call performance boost (needed in order to get your #drawsperframe @ 250fps). I'm guessing max 3x.

Also, since someone else replied that instancing might give a performance boost, indeed it would. And multithreading too!
But please note that this is not a thread about the generic performance for a renderer: just a focus on draw call performance, not on actually lowering the #draw calls per frame.

Thanks

It's actually useless for updating shader variables - you use constant buffers for that.

It's important to realise that the whole concept of techniques and passes is just an artefact of the effects framework. Remember that the effects framework is not in any way an API that talks directly to the hardware or driver - it's just a wrapper around the real D3D API. Everything in the effects framework is implemented using the real API, and you can study the source code for it (available in "[color=#0000cd]C:\Program Files (x86)\Microsoft DirectX SDK (June 2010)\Samples\C++\Effects11" if you have a reasonably up-to-date SDK installed) if you need to confirm that. Techniques and passes don't actually exist in HLSL - they're just concepts that are confined to effects, but are actually implemented using the real API.

So, in the case of updating shader variables, you can look at the code for [color=#0000cd]CheckAndUpdateCB_FX and see what it does. It keeps a backing store for the entire buffer in system memory, sets a dirty flag when a variable needs updating, and then when you call Apply, it updates the entire buffer and clears the dirty flag. All just using standard D3D calls like those I gave examples of above.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

What do you class as part of a 'draw call'? How much work do you include?

Because chances are if you want performance you are going to have to ditch the FX framework and start dealing with constants and other elements of a draw call yourself, properly batching/constraining updates.

Also how are you timing things?

However, a quick and dirty check shows that I can sustain ~12500 draw calls per-frame at ~250fps, which in turn shows that your performance woes are most likely coming from elsewhere.


Wow. What CPU is that running on? I just tried ~10000 draw calls and it runs at only ~12 fps and in a optimal setting of only a small constant buffer update (map/unmap of a D3DXMATRIX) and the actual drawcall. The best I can do is 3000 drawcalls at ~37 fps.

Are you sure it is not using instancing or multithreading?

P.S: My experiment was performed on a laptop with an i7 at 2.80 GHz (Turbo).

This topic is closed to new replies.

Advertisement