Sign in to follow this  
mind in a box

Switching PixelShader gains performance, without any pixels on screen.

Recommended Posts

Hi everyone!

 

I've been profiling my application lately, and I got something weird, which I hope one of you can explain to me.

 

In my Scene, I have a large World-Mesh and about 5000 decoration Objects, which all are using the same PixelShader, which simply does one Texture-Lookup. All in all, I get about 1000 Drawcalls from this and about 105fps on my laptop. There are no post process effects or similar techniques.

 

The weird thing is, when I switch to a very simple PixelShader (Using Intel GPA), my FPS-Counter ramps up to ~150, even when there are next to no pixels at all drawn on screen! (Small health-bar in the corner, but thats it).

Applying a 1x1 scissor-rect doesn't give the same effect.

 

How can switching a pixelshader improve performance by so much, when there aren't even any rendered pixels on screen?

 

Thanks in advance!

Share this post


Link to post
Share on other sites

When profiling a renderer, you've got to measure the CPU frametime and the GPU frametime. Actual frametime is approx max(CPU_frametime, GPU_frametime), and FPS is approx 1/actualFrametime.

 

If you're not actually drawing any pixels, then your GPU frametime is probably pretty low... so your framerate is probably being dictated by the CPU frametime.

105fps is ~9.5ms frametime, and 150fps is ~6.7ms frametime. With 1000 draw-calls per frame, that's (even more approximately), about 9.5?s per draw with the complex shader, or 6.7?s per draw with the simple shader in CPU time.

 

When you call a Draw function, both D3D itself and your graphics driver both have to run a lot of code. A handful of microseconds of CPU time per-draw call is completely normal.

In your case, it seems if you've got a simple shader, there's less validation work for the driver to perform, so you save ~2-3?s in driver validation overhead.

Share this post


Link to post
Share on other sites


If you're not actually drawing any pixels, then your GPU frametime is probably pretty low... so your framerate is probably being dictated by the CPU frametime.

105fps is ~9.5ms frametime, and 150fps is ~6.7ms frametime. With 1000 draw-calls per frame, that's (even more approximately), about 9.5?s per draw with the complex shader, or 6.7?s per draw with the simple shader in CPU time.

 

Yeah, I should be CPU-Limited here.

 

However, I did some more tests. I can't replicate this behavior by using my own simple shaders. There is something Intel GPA seems to be doing when switching to "Simple PixelShaders" other than replacing all pixelshaders with the simple version. Probably they block setting the shader-resources or something, I have to dig more into that.

Share this post


Link to post
Share on other sites

How can switching a pixelshader improve performance by so much, when there aren't even any rendered pixels on screen?


Graphics drivers and hardware are crazy complex. Drivers for the more implicit APIs (like D3D11) will surmise a lot of things about the necessary hardware state from the instructions in a pixel shader, and turn on or off various hardware features accordingly. The drivers also recompile shaders upon creation, and possibly even upon draw, and do all kinds of additional code folding and dead code elimination based on the combination of shader stages and other pipeline state.

If your "simple" shader is mostly a no-op then a number of features of the interpolators in the rasterization stage might be turned off, or some of the more advanced blending functions may be implicitly turned off, or more of the vertex shader may be implicitly elided as dead code since the pixel shader isn't using all of the outputs, or depth-buffer writing may be optimized differently, or so on.

It may also just be that your "regular" shader is really bad and penalizes the optimizer or hardware. Things like writing depth in the pixel shader disables all kinds of advanced hardware optimizations. The classic easy example would be early depth tests (where the hardware can skip pixel shader invocations based on the rasterizers' computed depth; if you write to the depth buffer in the pixel shader non-conservatively, the optimization has to be turned off since the pixel shader must be run completely, as the rasterizer's depth may not be accurate).

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this