[SharpDX] Speed issues when drawing

Started by
12 comments, last by Tape_Worm 12 years, 3 months ago

Here's some more ideas:

Is the shader in the D3D11 version more complicated than just doing a texture read?

Try turning alpha testing off in D3D9 (or on in D3D11 by using clip() in the shader). That can have a significant performance impact as it cuts down on the need to blend pixels with the frame buffer. The performance difference will depend on how transparent the texture is, and the card you're testing on.

It's also possible that the D3D9 version is rejecting pixels due to the depth testing or stencil testing. Make sure those are disabled in both cases.


The D3D9 code does not use a shader at all, just fixed function.

I removed alpha testing/blending, and I'm still getting the same results (well, not 100% the same, instead of 22 FPS in the D3D9 version, I'm getting 19 FPS because there's nothing to reject), and the D3D11 version is still horrifically slow. I'm starting to think this is something related to Direct3D 11 in general and/or my ATI drivers (using the latest version 11.12).


Hi,

I assume that you enabled the Direct3d debug libraries and studied the output (if any) from the program.

Best regards


Yes, I have, and no there's nothing of consequence being reported.
Advertisement
You're all brilliant. No one thing was the cause of the problem, however a couple things combined made a huge difference.

So, here's why I'm a dumbass:

  • Forgot anti-aliasing was still enabled.
  • The texture itself had no alpha to reject, so that was a part of it.
  • The fill size of the quad was another major part of it.
  • And because the fill size was rather large, -and- alpha blending was enabled, it was really slowing down.


It's still not performing at the same speed, but I expect that's due to how much of a mess this test app is.

Thanks guys, I appreciate the help. Hopefully one day I'll be able to return the favour.
Didnt they used to recommend changing the data (buffering it in the CPU mem) first and THEN locking the buffer, copying the entire block, unlocking, making the draw call --- so that the GPU could be processing in parallel with the Apps data creation ?




Also shouldnt "Matrix proj_view_world = proj_view * world;" be outside of the 8K loop ?? or is that part of the 'stress' ...
--------------------------------------------[size="1"]Ratings are Opinion, not Fact

Didnt they used to recommend changing the data (buffering it in the CPU mem) first and THEN locking the buffer, copying the entire block, unlocking, making the draw call --- so that the GPU could be processing in parallel with the Apps data creation ?




Also shouldnt "Matrix proj_view_world = proj_view * world;" be outside of the 8K loop ?? or is that part of the 'stress' ...


1. That's a good point. However, if you read through the thread you'll notice that I called this loop only once after my initial problem and I still had speed issues even though it was only calling DrawIndexed in the draw method. So this really had nothing to do with the issue.

2. I'm doing transforms on vertices and processing those vertices on the CPU so I can send them to the buffer pre-transformed. Putting that math outside of the loop would have broken that. I'm aware I can put the transform in the shader, however if I did that I'd end up having to make a single Draw call for each primitive that I want transformed. Unless there's another way to go about it (that'll work on D3D9 downlevel hardware)?

This topic is closed to new replies.

Advertisement