Setting custom Pixel Shader causes massive Performance Drop

Started by
6 comments, last by bonus.2113 12 years, 1 month ago
Hey all,
I ran into a problem with my 2D Deferred Lighting implementation and I hope you can help me.

For the 2D game I'm currently programming I decided to go with Deferred Lighting, because it is supposed to be easy to implement and fast.
I used this tutorial as a starting point and adapted it to my needs.

It's all working fine and looks amazing except that I get a sever drop in framerate as soon as I enable the lighting.
From about 2500 FPS (CPU-Bound/ 80% GPU load) it falls down to 1000 FPS(GPU-Bound/70% CPU load). Of course 1000 FPS is still quite high,
but considering my system (and the 2500 FPS without lighting) it's actually not that much.

I did some profiling and testing and it showed, that about 50% of processor time is spend in EndDraw(), this usually means, that the graphics card is overstrained and needs some time to catch up. I can't explain why that is, because after all I'm just rendering 9 lights to a 1280x720 texture and I've seen Deferred Rendering Engines with over 100 active lights.

This is how my code currently looks like:

Rendering:

private int RenderLights(List<Light> _lights, ref RenderTarget2D _lightingRT)
{
int lightsRendered = 0;
Device.BlendState = BlendState.Additive;
Device.SetRenderTarget(_lightingRT);
Device.Clear(Color.Black);
// For every light inside the current scene
foreach (Light light in _lights)
{
//simple early out condition (rough not in sight or not active)
if ((light.Position - (Graphics.Instance.MainCamera.Position + Graphics.Instance.Resolution / 2)).Length() >
(Graphics.Instance.Resolution / 2).Length() + light.Distance || !light.IsActive)
continue;

lightEffect.CurrentTechnique = lightEffect.Techniques["DeferredLight"];

//Set light parameters
lightEffect.Parameters["lightStrength"].SetValue(light.Intensity);
lightEffect.Parameters["lightColor"].SetValue(light.Color.ToVector3());
lightEffect.Parameters["lightRadius"].SetValue(light.Distance);
if (light is Spotlight)
{
lightEffect.Parameters["isSpotlight"].SetValue(true);
lightEffect.Parameters["angleCos"].SetValue((float)Math.Cos((light as Spotlight).Angle));
lightEffect.Parameters["lightNormal"].SetValue( Vector2.Transform(new Vector2(1, 0),
Matrix.CreateRotationZ(light.gameObject.transform.Rotation)));
}
else
lightEffect.Parameters["isSpotlight"].SetValue(false);
lightEffect.Parameters["screenWidth"].SetValue(Graphics.Instance.Resolution.X);
lightEffect.Parameters["screenHeight"].SetValue(Graphics.Instance.Resolution.Y);
lightEffect.Parameters["lightPosition"].SetValue(Graphics.Instance.MainCamera.WorldToScreen(light.Position));
//Apply Pass
lightEffect.CurrentTechnique.Passes[0].Apply();
// Draw the full screen Quad
Device.SetVertexBuffer(vertexBuffer);
Device.DrawUserPrimitives(PrimitiveType.TriangleStrip, vertices, 0, 2);
lightsRendered++;
}
// Deactivate alpha blending
Device.BlendState = BlendState.Opaque;
// Deactive the rander targets to resolve them
//Device.SetRenderTarget(null);
return lightsRendered;
}


And the shader:

float screenWidth;
float screenHeight;
float lightStrength;
float lightRadius;
float2 lightPosition;
float3 lightColor;
bool isSpotlight;
float angleCos;
float2 lightNormal;

void VertexToPixelShader(inout float2 texCoord: TEXCOORD0, inout float4 Position : POSITION)
{
}

float4 LightShader(float2 TexCoord : TEXCOORD0) : COLOR0
{
float2 pixelPosition;
pixelPosition.x = screenWidth * TexCoord.x;
pixelPosition.y = screenHeight * TexCoord.y;
float3 shading;
float2 lightDirection = pixelPosition - lightPosition;

float distance = length(lightPosition - pixelPosition);
//early out if the pixel is out of range
if(distance > lightRadius)
{
return float4(0,0,0,0);
}
float coneAttenuation = saturate(1.0f - distance / lightRadius);
if(isSpotlight)
{
float dotP = dot(lightNormal, normalize(lightDirection));
coneAttenuation *= saturate(dotP - ((1-dotP)/(1-angleCos) * angleCos));
}
shading = pow(coneAttenuation, 2) * lightColor * lightStrength;
return float4(shading.r, shading.g, shading.b, 1.0f);
}

technique DeferredLight
{
pass Pass1
{
VertexShader = compile vs_2_0 VertexToPixelShader();
PixelShader = compile ps_2_0 LightShader();
}
}


My first guess was, that the lighting shader is just too slow, so I made it to instantly return solld black. But that didn't change anything. The framerate stayed exactly the same.

I discovered that as soon as I don't apply the shader's pass (comment "lightEffect.CurrentTechnique.Passes[0].Apply();" out) it runs fine, but of course without the lighting.

I even did some profiling with PIX and looked at the time one light draw call takes. With my own shader, which just returns black it were about
200,000 ns, with the default one 130,000 ns. That difference could explain why the framerate drops, but I don't understand, why the default shader is so much faster than just returning black.

Does anyone have an idea of what might causing the heavy load for the graphics card?

Thanks, bonus.2113
Advertisement
Are you sure its not because theres branching in your shader?

o3o

Hi,

First of all, drop from 2500 to 1000 isn't that big. Fps is a bad measure of performance. Instead calculate the frame time ie. 1.0f / fps and use that as a reference for evaluating performance.

Secondly, if you are really worried about your performance, there are optimizing tricks you may use:

- do all your lights affect the whole screen? if not, do not draw a full screen quad, just a quad covering all the pixels of a light
- is it possible to draw all your lights in 1 draw call? Ie. loop through your all affecting lights for each pixel. This way you can draw pretty many lights with a single full screen quad.

First of all, drop from 2500 to 1000 isn't that big. Fps is a bad measure of performance. Instead calculate the frame time ie. 1.0f / fps and use that as a reference for evaluating performance.


I tested it on my laptop and there the frametime goes from ~7 ms(150 fps) up to ~20 ms(50 fps), and that's quite alot. Also, shouldn't be saying "The framerate is 2.5 times less" be the same as "the frametime is 2.5 times higher"? Aren't frametime and framrate the same measurement just in different units? (I understand that the framerate values can be misleading when you think in pure numbers, as the difference between 2000 fps and 2100 is way less than 100 and 200)


- do all your lights affect the whole screen? if not, do not draw a full screen quad, just a quad covering all the pixels of a light


I thought about that, but I guessed that it is better to find out if there is a core problem in the way I'm currently doing it before I start to try to optimize anything.


- is it possible to draw all your lights in 1 draw call? Ie. loop through your all affecting lights for each pixel. This way you can draw pretty many lights with a single full screen quad.


I'm currently using shader model 2.0 so, as far as I know, the number of lights in one pass would be really limited. I could reduce the number of passes though.



Are you sure its not because theres branching in your shader?


Yeah, I'm pretty sure, because it was just as slow when I returned black straight away.
Well I showed the relation of frame time (time per frame) and frame rate (fps), so isn't exactly just different units in question. Both measurements are useful. FPS for general performance and frame time for seeing things better in relation. 1000fps vs 2000fps is just a difference of 0,0005 seconds. Adding another 0,0005 seconds to 0,001 (1000fps) won't halve your fps. Even if you used 0,01 seconds for each frame that would give you 100 fps which is more than needed in most of the cases.

Can you describe your hardware a bit? Laptops may not have a strong GPU. I am not totally convinced that comparing the default shader and a shader returning black color to be a good test. You'll need to compare shaders that actually do something.

From the parts of the code you have showed, there shouldn't be major bottle necks and you have already concluded that on the GPU is the bottle neck.
If you are able to calculate multiple lights in one shader pass you'll also have less pixels to blend additively, which should be beneficial. Have you tried disabling the additive blending and see what kind of effect it has on the performance?

Best regards!
With different units I meant that seconds/frames and frames/seconds both measure how fast a program runs.
But you're right that frametime is way more useful for seeing changes in performance.
I didn't think about that before, thanks for the clarification!


Can you describe your hardware a bit?


My desktop is fairly powerful (i7-2700K, GTX 560 Ti). But I know that the graphics card is a bottleneck in my setup.
My laptop is way worse, but at least it does have a dedicated graphics card.


I am not totally convinced that comparing the default shader and a shader returning black color to be a good test. You'll need to compare shaders that actually do something.


I was just confused that there was no difference between a shader that does some calculations and has branching and one that returns a color without doing anything else.


Have you tried disabling the additive blending and see what kind of effect it has on the performance?

It has no impact at all.

Anyway, I'll try some optimizing (especially drawing quads at the smallest size possible).
I was just really disappointed what a large impact lighting had on the overall performance of the game, after I read all the praise about how the light count doesn't really matter in Deferred Lighting.
Thanks for your help!
Deferred rendering is rather heavy technique for hardware, mostly for the fill rate and texture reads.

For many simple (or even a bit more complex) scenarios a basic forward renderer could beat up a deferred renderer in performance.

There are few things which allow you to take the most out of a deferred renderer:

- Superiour batching, since lighting and geometry are separated in the deferred rendering, in the best case you may spend just a single draw call per object type. I'm not sure if geometry instancing is well supported under XNA. Minimizing the draw calls has always been the key with Direct3D.

- Even if filling the render targets is expensive, you won't waste GPU cycles on lighting pixels which won't be visible at the final image. This may be a huge save especially with shadow mapping

- Lighting is still expensive under deferred rendering, mostly because you'll be probably reading lots of data from different buffers. So it is important to minimize your texture reads. However, it is fully possible to draw hundreds of lights with a deferred renderer (well probably not full screen) with interactive frame rate.


Your desktop setup would really shine with Direct3D 11 :)

Cheers!
I also noticed that it really improves the performance to use a smaller rendertarget to do the lighting on. At least I don't see any visual difference while using one, that is 1/16th of the original size (both sides divided by four). That may be different for other games, but it works fine for me.

This topic is closed to new replies.

Advertisement