Jump to content

View more

Image of the Day

Full Syncs #screenshotsaturday favourites https://t.co/i1Flnwcg3l #xbox #ps4 https://t.co/m0v2F1SxGs
IOTD | Top Screenshots

The latest, straight to your Inbox.

Subscribe to GameDev.net's newsletters to receive the latest updates and exclusive content.

Sign up now

Simple point light fragment shader very slow

2: Adsense
  • You cannot reply to this topic
7 replies to this topic

#1 gearifysoftware   Members   


Posted 28 February 2017 - 08:57 PM

I have a deferred rendering system, and I am testing at full 1920x1080 and noticing that a particular fragment shader is causing significant slowdown (they all seem oddly slow, but this one seems the most significant). My shader is based on the OGL-Dev tutorials for deferred rendering.

The purpose of this shader is to calculate the added light within a light's area of effect. There are 3 large point lights in my scene, that cover everything in view. So essentially this shader is executing 3 times for every pixel on the screen.

I measured the difference in frame time between running this shader, and then replacing the shader's main function with a simple "FragColor = vec4(1,0,0,1); //red". The total difference in time (for my engine to render an entire frame) is 4-5 milliseconds. If I'm shooting for 60 fps, that's already 25+% of my full frame render time, which seems kind of crazy (I removed shadow calculation and blur before taking these measurements).

Here is my shader:

#version 420

layout (location = 0) out vec4 FragColor;

struct BaseLight
    vec3  Color;
    float AmbientIntensity;
    float DiffuseIntensity;

struct Attenuation
    float Constant;
    float Linear;
    float Exp;

struct PointLight
    BaseLight Base;
    vec3 Position;
    Attenuation Atten;

uniform sampler2D gPositionMap;
uniform sampler2D gColorMap;
uniform sampler2D gNormalMap;
uniform sampler2D gShadowMap;
uniform PointLight gPointLight; 
uniform vec3 gEyeWorldPos;
uniform float gMatSpecularIntensity;
uniform float gSpecularPower;
uniform int gLightType;
uniform vec2 gScreenSize;

uniform mat4 gWVP;
uniform mat4 gVP;
uniform mat4 gView;

vec4 CalcLightInternal(BaseLight Light,
					   vec3 LightDirection,
					   vec3 WorldPos,
					   vec3 Normal, float shadowFactor )
    vec4 AmbientColor = vec4(Light.Color, 1.0f) * Light.AmbientIntensity;
    float DiffuseFactor = dot(Normal, -LightDirection);

	float shadowFactorOrig = shadowFactor;

    vec4 DiffuseColor  = vec4(0, 0, 0, 0);
    vec4 SpecularColor = vec4(0, 0, 0, 0);

    if (DiffuseFactor > 0) {

		DiffuseColor = vec4(0,1,0,1);
        DiffuseColor =  vec4(Light.Color, 1.0f)  * DiffuseFactor ;

        vec3 VertexToEye = normalize(gEyeWorldPos - WorldPos);
        vec3 LightReflect = normalize(reflect(LightDirection, Normal));
        float SpecularFactor = dot(VertexToEye, LightReflect);
        SpecularFactor = pow(SpecularFactor, gSpecularPower);
        if (SpecularFactor > 0) {
            SpecularColor = vec4(Light.Color, 1.0f) * gMatSpecularIntensity * SpecularFactor;

	return   (AmbientColor + shadowFactor*(DiffuseColor+SpecularColor));


vec4 CalcPointLight(vec3 WorldPos, vec3 Normal)
    vec3 LightDirection = WorldPos - gPointLight.Position;
    float Distance = length(LightDirection);
    LightDirection = normalize(LightDirection);

    vec4 Color = CalcLightInternal(gPointLight.Base, LightDirection, WorldPos, Normal,1.0f);

    float Attenuation =  gPointLight.Atten.Constant +
                         gPointLight.Atten.Linear * Distance +
                         gPointLight.Atten.Exp * Distance * Distance;

    Attenuation = min(1.0, Attenuation);

    return Color / Attenuation;

vec2 CalcTexCoord()
    return gl_FragCoord.xy / gScreenSize;

void main()
    vec2 TexCoord = CalcTexCoord();
	vec3 WorldPos = texture(gPositionMap, TexCoord).xyz;
	vec3 Color = texture(gColorMap, TexCoord).xyz;
	vec3 Normal = texture(gNormalMap, TexCoord).xyz;
	Normal = normalize(Normal);
    FragColor =  CalcPointLight(WorldPos, Normal); 


I am on a fairly fast HP machine with plenty of memory and a NVIDIA Quadro K1100M graphics card. Also, I already checked the VSync to make sure my render times are not forcing my render times to be multiples of 16ms. 

This shader does not contain an inordinate amount of looping or branching. It is executing about 3 times per pixel. Should this really be adding 4-5 milliseconds to my render times? 

Any ideas for what could be causing this would be much appreciated.


#2 Hodgman   Moderators   


Posted 28 February 2017 - 09:50 PM

Laptops generally have an Intel GPU as well as that NVidia "M" GPU. Your game could be using the wrong graphics adaptor?

#3 Matias Goldberg   Members   


Posted 28 February 2017 - 10:12 PM

Aside from what Hodgman said, your branches are doing more harm than good:

if( DiffuseFactor > 0 )
   if( SpecularFactor > 0 )

Just do DiffuseFactor = max( 0, DiffuseFactor ); and same for the SpecularFactor.


Also note those 4ms do not necessarily have to scale linearly with the number of objects. The number of covered pixels affects a lot; and Early Z testing can also amortize the cost a lot.

#4 gearifysoftware   Members   


Posted 28 February 2017 - 11:43 PM

I tried replacing the conditionals with the "max()" function. Marginal improvement, maybe 1 ms less frame time. Something, but doesn't quite give the full story I think.

I double checked, but no, no Intel graphics hardware on this machine. 

I did notice this  https://www.techpowerup.com/gpudb/2430/quadro-k1100m which states:

"We recommend the NVIDIA Quadro K1100M for gaming with highest details at resolutions up to, and including, 1024x768."

I am wondering if this card just can't handle those resolutions... still seems unacceptably slow

#5 Hodgman   Moderators   


Posted 01 March 2017 - 12:16 AM


I tried replacing the conditionals with the "max()" function. Marginal improvement, maybe 1 ms less frame time.

At 60Hz, 1ms isn't marginal, it's a massive amount!

I assumed this was a newish GPU, but that link shows that it's similar to the GeForce 9600 GT, which is nearly a decade old. Pulling off 1080p 60Hz gaming with full dynamic lighting was a hard target in that era... Your perf numbers sound plausible in that case :P

You'll have to be very careful with shader instruction optimization, minimizing overdraw, minimizing bandwidth, drawing as much stuff at lower resolutions as possible (e.g. half resolution post processing), etc... and also minimising the number of lights that are being calculated. I shipped more than one game in that era that only supported two lights per object :wink:

#6 gearifysoftware   Members   


Posted 01 March 2017 - 07:52 AM

Thanks for the analysis. That's a relief that I can (at least partially) blame my hardware!

I guess this calls into question whether or not I want to optimize for this machine or aim for a stronger graphics card. I am shooting for a game that doesn't have to be on a souped up gaming rig, but also doesn't need to run on a dinosaur... something mid range is what I'd like.

I could potentially switch over to my other laptop, which has this card: https://www.techpowerup.com/gpudb/1490/geforce-gt-525m

Could I expect much more out of this one? I see its production status is "End of Life'.

Maybe its just time for a hardware upgrade... 

#7 Hodgman   Moderators   


Posted 01 March 2017 - 08:01 AM

You can also just aim for 30Hz 720p or similarly low resolutions on those older cards, and leave the 60Hz 1080p goal to newer hardware.

#8 gearifysoftware   Members   


Posted 01 March 2017 - 09:51 AM

You can also just aim for 30Hz 720p or similarly low resolutions on those older cards, and leave the 60Hz 1080p goal to newer hardware.


I like that! I get way way better performance at 720p and I think 30hz should be very achievable. 

I consider my issue solved. Very much appreciate the help!

Edited by gearifysoftware, 01 March 2017 - 09:53 AM.