Advertisement Jump to content
Sign in to follow this  
markypooch

To much computation in my shader?

This topic is 1888 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello everyone,

 

I'm running into some dfficulty with implementing a multiple point light algorithm in my shaders.

It works, however it's lagging up a devil!

 

Im still pretty new to Graphics programming.

 

Below is my .fx file

 

 
Texture2D                colorMap[2]            : register(t0);
SamplerState            colorSampler        : register(s0);

cbuffer world : register(b0)
{
    matrix worldMat;
}

cbuffer view  : register(b1)
{
    matrix viewMat;
}

cbuffer proj  : register(b2)
{
    matrix projMat;
}

struct Light
{
    float3 dir;
    float  range;
    float3 att;
    float4 diffuse;
    float4 ambient;
};

cbuffer LightCB  : register(b0)
{
    Light light[107];
}

struct VS_Input
{
    float4 pos     : POSITION;
    float2 tex     : TEXCOORD;
    float3 norm    : NORMAL;
    float3 binorm  : BINORMAL;
    float3 tang    : TANGENT;
};

struct PS_Input
{
    float4 pos     : SV_POSITION;
    float4 wPos    : POSITION;
    float2 tex     : TEXCOORD;
    float3 norm    : NORMAL;
    float3 binorm  : BINORMAL;
    float3 tang    : TANGENT;
};

PS_Input VS(VS_Input vertex)
{
    PS_Input vsOut = (PS_Input)0;

    vsOut.pos = mul(vertex.pos, worldMat);
    vsOut.pos = mul(vsOut.pos , viewMat);
    vsOut.pos = mul(vsOut.pos , projMat);

    vsOut.norm = mul(vertex.norm, worldMat);
    vsOut.norm = normalize(vsOut.norm);

    vsOut.binorm = mul(vertex.binorm, worldMat);
    vsOut.binorm = normalize(vsOut.binorm);

    vsOut.tang   = mul(vertex.tang, worldMat);
    vsOut.tang   = normalize(vsOut.tang);

    vsOut.wPos   = mul(vertex.pos, worldMat);
    vsOut.tex = vertex.tex;

    return vsOut;
}

float4 PS(PS_Input texel) : SV_TARGET
{
    float3 finalColor = float3(0.0f, 0.0f, 0.0f);
    float4 textureMap = colorMap[0].Sample(colorSampler, texel.tex);
    float3 lightToPixelVec[107];
    float  d[107];
    float3 finalAmbient;
    float4 color;
    bool   isCloseEnough = false;

    int SubscriptedInteger;
    
    for (int i = 0; i < 107; i++)
    {
        lightToPixelVec[i] = light[i].dir - texel.wPos;
        
        d[i] = length(lightToPixelVec[i]);
    
        if( d[i] <= 2.0f )
        {
            isCloseEnough = true;
            SubscriptedInteger = i;
        }
    }


    if (isCloseEnough == true)
    {
            finalAmbient = textureMap * light[SubscriptedInteger].ambient;

            lightToPixelVec[SubscriptedInteger] /= d[SubscriptedInteger];
    
            float howMuchLight = dot(lightToPixelVec[SubscriptedInteger], texel.norm);

            if( howMuchLight > 0.0f )
            {    
                finalColor += howMuchLight * textureMap * light[SubscriptedInteger].diffuse;
        
                finalColor /= light[SubscriptedInteger].att[0] + (light[SubscriptedInteger].att[1] * d[SubscriptedInteger]) + (light[SubscriptedInteger].att[2] * (d[SubscriptedInteger]*d[SubscriptedInteger]));
            }    
        
            finalColor = saturate(finalColor + finalAmbient);
    
            textureMap *= float4(finalColor, textureMap.a);

            float4 bumpMap    = colorMap[1].Sample(colorSampler, texel.tex);

            bumpMap = (bumpMap * 2.0f) - 1.0f;

            float bumpNormal = texel.norm + bumpMap.x * texel.tang + bumpMap.y * texel.binorm;

            float lightIntensity = saturate(dot(bumpNormal, 5.0f));

            color = saturate(lightIntensity * float4(finalColor, textureMap.a));
    
            color *= textureMap;
    }
    else
    {

        finalAmbient = textureMap * light[0].ambient;

        textureMap *= float4(finalAmbient, textureMap.a);

        float4 bumpMap    = colorMap[1].Sample(colorSampler, texel.tex);

        bumpMap = (bumpMap * 2.0f) - 1.0f;

        float bumpNormal = texel.norm + bumpMap.x * texel.tang + bumpMap.y * texel.binorm;

        float lightIntensity = saturate(dot(bumpNormal, 5.0f));

        color = saturate(lightIntensity * light[0].ambient);
    
        color *= textureMap;
    }

    return color;
}
 

 

Am I trying to compute to much in my shaders?

Should this be something I do in D3D and not HLSL?

 

Or is it time for me to implement Frustrum culling

Any answer or response will be appreciated

 

-Marcus

 

Share this post


Link to post
Share on other sites
Advertisement

Your problem is the for loop that goes round 107 times. I presume that's for a scene with 107 lights in it.

 

What you need to do is to do some CPU work to limit the number of lights you pass to the shader to the relevant ones. There are also other techniques like baking the lights into a light map, or switch to deferred lighting.

 

To handle the lights on the CPU the algorithm is approximately:

 

1. Work out which lights intersect the frustum. The rest need no further testing.

 

2. For each model you render, further reduce the light list to the ones that intersect the bounds of the object, and pass that reduced list to the shader.

 

3. Optionally reduce that per model list of lights down to a smaller number by ignoring the ones which you estimate will have the least influence on the object.

 

 

To pass a variable number of lights to a shader there's also a few options:

 

a. Create one shader for each different number of lights. This will run fastest, but can create too many shaders.

b. Always pass the same number of lights (e.g. 4) and pad the list with small dark lights.

c. Use a dynamic loop in the shader. This can be faster than b when the maximum number of lights is big, and the average is low.

Edited by Adam_42

Share this post


Link to post
Share on other sites

107 lights in a forward renderer with multiple dynamic branches, and you wonder where your performance is?

Deferred shading, that's what you need. Check this from SIGGRAPH 2010, with sample code here.

 

And frustrum culling is always good if you have a lot of objects that are off-screen.

 

BTW, just tried going from 3 to 100 lights in a sample of mine, no shadows, no branching, single object. Performance went down from 170 to 2 FPS.

Share this post


Link to post
Share on other sites

1.

The code doesn't make much sense. What are you trying to do? It seems that for each pixel fragment in your pixel shader you are checking its distance from each of 107 lights and if it's <= 2.0f, you remember the index of the light and then use this one light to really lit the pixel fragment. BUT, you always check all 107 lights and even if all of them are in the range, you remember just the last index, because you always overwrite it.

 

2.

Let's say your idea was to check all 107 lights and use the one which is the closest and also is closer than 2.0f, which could sound resonable (but your code isn't doing this). Then I have to ask - do you need a pixel precision? I mean, do you need to find the closest light for each pixel separately? Even doing it per vertex (in vertex shader) would save a lot of performance. But that would still be overkill. Are you sure you cannot do it per object? In your application code, not in a shader?

 

3.

Do you need 107 lights? Maybe yes, but then - do you need to check all your objects (see point 2) with all of them? Couldn't you partition your scene anyhow?

Edited by Tom KQT

Share this post


Link to post
Share on other sites

107 lights in a forward renderer with multiple dynamic branches, and you wonder where your performance is?

Deferred shading, that's what you need. Check this from SIGGRAPH 2010, with sample code here.

 

And frustrum culling is always good if you have a lot of objects that are off-screen.

 

BTW, just tried going from 3 to 100 lights in a sample of mine, no shadows, no branching, single object. Performance went down from 170 to 2 FPS.

Just as a mention but there is a way to do this with forward shading and it is called Forward+ and it uses similar techniques as tiled deferred shading without the disadvantages of a deferred renderer (think transparencies).

 

But in either Case I wouldn't recommend deferred or foward+ to a beginning graphics coder, stick to 8 lights(FF pipeline handle this as max for hardware forward shading) in a forward mode and learn about frustum culling and those things. After that look into deferred or forward plus, or their tiled or clustered based shading versions.

Edited by NightCreature83

Share this post


Link to post
Share on other sites

Thanks for the replies.

 

I managed to save my frames by implementing a bounding box around the viewMatrix

 

However a new problem has emerged. And I think Tom KQT shed some light unto it.

 

Only one point light shows at a time now. My ViewMatrix changes the position of the point light but only does One Point Light.

 

Also I would like to know how to submit pictures to this post. It asks for a URL im guessing it means the local file directory on my system?

 

-Marcus

Edited by markypooch

Share this post


Link to post
Share on other sites

Hello,

 

I have taken partial advice N.I.B, TOM KQT (helped me find a logic error in my code, I have implemented an array of SubscriptedIntegers as to not overwrite the indexes), and NightCreature83 and about everone else who posted.

 

I implemented a bounding box around the ViewMatrix and only submitted four of the 107 lights that intersected the box to the shader at once. Also regarding the earlier problem. I took a closer at my structures and found out they were straddiling 16 byte boundaries after I corrected that all is well! :D

 

Thanks everyone!

 

-Marcus

Edited by markypooch

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!