Jump to content
  • Advertisement
Sign in to follow this  
CrashyCartman

Sharpen shader performance

This topic is 660 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi,

After running my app through NSight, I've seen that the image sharpening shader I use takes around 3 ms to process a 1600x1200 texture.

The GPU is a nVidia GTX 560 Ti.

 

This seems quite high to me, but I do not know why it takes so much time.

The shader code, nothing complicated :

#version 330

uniform sampler2D     sceneSampler;
uniform vec4  screenSize;

in vec2 Texcoord;
out vec4 oColor;

float kernel[9]=float[9]
(
 0.0,
-1.0,
 0.0,
-1.0,
 5.0,
-1.0,
 0.0,
-1.0,
 0.0
);
vec2 offset[9]=vec2[9]
(
    vec2(-1.0, -1.0),
    vec2( 0.0, -1.0),
    vec2( 1.0, -1.0),
    vec2(-1.0,  0.0),
    vec2( 0.0,  0.0),
    vec2( 1.0,  0.0),
    vec2(-1.0,  1.0),
    vec2( 0.0,  1.0),
    vec2( 1.0,  1.0)

);
void main()
{
    vec4 result= vec4(0.0);
    int i;
    for (i = 0; i < 9; i++)

    {

            vec4 color = textureLod(sceneSampler, Texcoord + offset[i]*screenSize.zw,0);

            result += color * kernel[i];

    }
    oColor=result;
}

One strange thing: if I use change the arrays to const, the shader execution times increases to 12 ms ! I've read such things about glsl shaders, but does anybody have an explanation ?

 

Thanks for any help/ hint.

Edited by LM-Crashy

Share this post


Link to post
Share on other sites
Advertisement

Ok, good news !

 

I've manually unrolled the loop to:

vec4 result = vec4(0.0);

    vec4 color = textureLod(resultSampler, Texcoord + offset[0]*screenSize.zw,0);
    result += color * kernel[0];
    color = textureLod(resultSampler, Texcoord + offset[1]*screenSize.zw,0);
    result += color * kernel[1];
    color = textureLod(resultSampler, Texcoord + offset[2]*screenSize.zw,0);
    result += color * kernel[2];
    color = textureLod(resultSampler, Texcoord + offset[3]*screenSize.zw,0);
    result += color * kernel[3];
    color = textureLod(resultSampler, Texcoord + offset[4]*screenSize.zw,0);
    result += color * kernel[4];
    color = textureLod(resultSampler, Texcoord + offset[5]*screenSize.zw,0);
    result += color * kernel[5];
    color = textureLod(resultSampler, Texcoord + offset[6]*screenSize.zw,0);
    result += color * kernel[6];
    color = textureLod(resultSampler, Texcoord + offset[7]*screenSize.zw,0);
    result += color * kernel[7];
    color = textureLod(resultSampler, Texcoord + offset[8]*screenSize.zw,0);
    result += color * kernel[8];

And now the shader execution time is 0.325 ms !

Isn't the nVidia glsl compiler able to unroll this by itself ? In hlsl there is the [unroll] hint, is there anything similar in glsl ?

Edited by LM-Crashy

Share this post


Link to post
Share on other sites
A quick google turned up the fact that there's vendor specific hints in GLSL, such as #pragma optionNV (unroll all)... :(

Personally I'd recommend treating GLSL like JavaScript on the modern Web -- never directly write the files that will ship; always have them generated from some build system and potentially have the real source code in a better language altogether. That way you can have clean source code and ship ugly optimized GLSL files, such as with pre-unrolled loops.

Share this post


Link to post
Share on other sites


A quick google turned up the fact that there's vendor specific hints in GLSL, such as #pragma optionNV (unroll all)...

Yeah, I saw that just after my reply. But too bad that's vendor specific.

 


Personally I'd recommend treating GLSL like JavaScript on the modern Web -- never directly write the files that will ship; always have them generated from some build system and potentially have the real source code in a better language altogether. That way you can have clean source code and ship ugly optimized GLSL files, such as with pre-unrolled loops.

Well, it might be the reason why a lot of people tends to use GLSLOptimizer.

Thanks.

Share this post


Link to post
Share on other sites

If you were targeting mobile GPUs, I'd say you could expect another huge performance gain by calculating the UV coordinates on the vertex shader and passing them through as vec2 varyings (well, 7 vec2 and 1 vec4 because you only get 8 varyings on some mobile GPUs). There'd be 2 big gains, firstly, you'd be skipping a bunch of per fragment calculations and secondly you'd be minimizing dependent texture reads.

 

As you're targeting desktop, I'm not so confident it'll have any measurable effect, but it's a simple enough experiment, so if I were in your shoes I'd give it a go.

Share this post


Link to post
Share on other sites

If you were targeting mobile GPUs, I'd say you could expect another huge performance gain by calculating the UV coordinates on the vertex shader and passing them through as vec2 varyings (well, 7 vec2 and 1 vec4 because you only get 8 varyings on some mobile GPUs). There'd be 2 big gains, firstly, you'd be skipping a bunch of per fragment calculations and secondly you'd be minimizing dependent texture reads.

Thanks for the hint, I'm not targeting mobile platforms for now but it's always good to know such things.

Share this post


Link to post
Share on other sites
I wouldn't think so. When Cg was still active, you could actually compile from Cg->GLSL :wink:
Silence must be thinking of the fact that GLSL will be compiled into an internal format that's specific to NVidia/AMD/Intel/Qualcomm/etc, and God knows what happens in that step.
At least with HLSL->D3D bytecode->Vendor ASM (and GLSL->SPIR-V bytecode->Vendor ASM) there's a middle step where you can see what optimisations the offline compiler has performed :D

Share this post


Link to post
Share on other sites

I guess the speed up comes not from the manual unroll, but because only by doing so the compiler becomes clever enough to replace slow array lookups with constants.

 

However, i wonder if a compute shader implementation would be faster, e.g. processing 8x8 pixels per invocation.

There would be much less texture access, and maybe it beats texture cache.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!