performance problem with my renderer

Started by
18 comments, last by 21st Century Moose 11 years, 11 months ago

[quote name='mhagain' timestamp='1337276641' post='4940981']
However, a quick and dirty check shows that I can sustain ~12500 draw calls per-frame at ~250fps, which in turn shows that your performance woes are most likely coming from elsewhere.


Wow. What CPU is that running on? I just tried ~10000 draw calls and it runs at only ~12 fps and in a optimal setting of only a small constant buffer update (map/unmap of a D3DXMATRIX) and the actual drawcall. The best I can do is 3000 drawcalls at ~37 fps.

Are you sure it is not using instancing or multithreading?

P.S: My experiment was performed on a laptop with an i7 at 2.80 GHz (Turbo).
[/quote]
It's also a laptop with an i7; I'm not doing cbuffer updates for each individual call (they're scattered throughout though) but I am doing texture changes. The shaders and textures are quite simple, so the measurement is a good reflection of draw calls and without too much other work being done to skew the figures. I basically just took a nice batched up renderer (~200 calls when batching) and unbatched it, converting a DrawIndexed (...) call to multiple Draw (...) calls. Definitely no instancing or multithreading.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Advertisement

What do you class as part of a 'draw call'? How much work do you include?

Because chances are if you want performance you are going to have to ditch the FX framework and start dealing with constants and other elements of a draw call yourself, properly batching/constraining updates.

Also how are you timing things?

What do I consider as part of a draw call? Well basically setting effect variables for the object if needed, and then call ID3D10Device::Draw()
I'm afraid you might need to be a bit more specific as to how I am timing things?


I just tried ~10000 draw calls and it runs at only ~12 fps and in a optimal setting of only a small constant buffer update (map/unmap of a D3DXMATRIX) and the actual drawcall. The best I can do is 3000 drawcalls at ~37 fps.

Are you sure it is not using instancing or multithreading?

P.S: My experiment was performed on a laptop with an i7 at 2.80 GHz (Turbo).

Please include the Direct3D version under which your engine is running too!



It's also a laptop with an i7; I'm not doing cbuffer updates for each individual call (they're scattered throughout though) but I am doing texture changes. The shaders and textures are quite simple, so the measurement is a good reflection of draw calls and without too much other work being done to skew the figures. I basically just took a nice batched up renderer (~200 calls when batching) and unbatched it, converting a DrawIndexed (...) call to multiple Draw (...) calls. Definitely no instancing or multithreading.

What do you mean with batched up?
You ARE testing using a release version of the program?

o3o

Yes of course smile.png

I just tried ~10000 draw calls and it runs at only ~12 fps and in a optimal setting of only a small constant buffer update (map/unmap of a D3DXMATRIX) and the actual drawcall. The best I can do is 3000 drawcalls at ~37 fps.


d3d11.1 should help with the constant setting part actually. It allows you to build up a massive constant buffer with all your scene consts, and then give d3d a window into it for a specific draw, so you only need to push data to the GPU once.

This seems unrelated to your case if you were not setting textures / render state blocks, etc. per draw, but I got noticeable speedups by filtering out all redundant API calls with a simple state cache. Maybe something the OP should look into?

[quote name='Xcrypt' timestamp='1337339328' post='4941153']
I just tried ~10000 draw calls and it runs at only ~12 fps and in a optimal setting of only a small constant buffer update (map/unmap of a D3DXMATRIX) and the actual drawcall. The best I can do is 3000 drawcalls at ~37 fps.


d3d11.1 should help with the constant setting part actually. It allows you to build up a massive constant buffer with all your scene consts, and then give d3d a window into it for a specific draw, so you only need to push data to the GPU once.

This seems unrelated to your case if you were not setting textures / render state blocks, etc. per draw, but I got noticeable speedups by filtering out all redundant API calls with a simple state cache. Maybe something the OP should look into?
[/quote]

1) I didn't say that! Wrong quote :P
2) I am not doing any redundant state settings through comparing the current active state for everything with the target state. And yes, this is def. worth it.

[quote name='ATEFred' timestamp='1337361103' post='4941220']
[quote name='Xcrypt' timestamp='1337339328' post='4941153']
I just tried ~10000 draw calls and it runs at only ~12 fps and in a optimal setting of only a small constant buffer update (map/unmap of a D3DXMATRIX) and the actual drawcall. The best I can do is 3000 drawcalls at ~37 fps.


d3d11.1 should help with the constant setting part actually. It allows you to build up a massive constant buffer with all your scene consts, and then give d3d a window into it for a specific draw, so you only need to push data to the GPU once.

This seems unrelated to your case if you were not setting textures / render state blocks, etc. per draw, but I got noticeable speedups by filtering out all redundant API calls with a simple state cache. Maybe something the OP should look into?
[/quote]

1) I didn't say that! Wrong quote tongue.png
2) I am not doing any redundant state settings through comparing the current active state for everything with the target state. And yes, this is def. worth it.
[/quote]

argh, my bad. I did selection quote selecting within a quote, must have got confused ;).
draw call overhead seems to vary a lot from one graphics card to another.
My gui library is not batching some things yet, so currently on a test where I display many buttons I get two draw calls per button. I just tried to render about two hundred of them (with opengl 4.2). I avoid issuing redundant state changes using a simple state cache, too.

I have a framerate limiter at 60 fps. On my geforce gtx580 it takes about 30% cpu (as seen in top, as I'm working in linux)

I just tried it again with my old radeon hd 5830 and there it takes about 50% cpu. I remember that this type of test resulted in even higher cpu usage before on the radeon, but since it was a while ago and I just reinstalled ati drivers for this test I probably had more recent ones where they may have improved things.

I couldn't compare FPS obtained by just letting it run without the framerate limiter, as it sadly turns out that my text renderer (and therefore my fps counter) is not working on the radeon for some reason. I'm not sure measuring FPS is a good way to compare the overhead of draw calls anyway, unless you render only 1 pixel polygons or something.

You ARE testing using a release version of the program?


Thanks for reminding...
In release mode I can have ~20000 drawcalls at ~100 fps... (DirectX 11)

So I think the OP should definitely consider writing a custom effects framework, an optimize it according to how his engine work to reduce redundant state changes / CB updates, etc...
At this stage it seems clear that it's your matrix updates, and not the number of draw calls, that are your primary bottleneck. If all that you're updating is a matrix, and if that matrix lives in the same cbuffer as other shader constants, then you really should consider splitting it out to a separate cbuffer on it's own. That will enable D3D to update it more efficiently and transfer less data to the GPU for each such update.

It's really difficult to say much more without a better description of what exactly you're doing, and without seeing some code. Without those, everything is just guesswork.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

This topic is closed to new replies.

Advertisement