Blur shader

Started by
9 comments, last by MJP 9 years, 7 months ago

I'm working on exponential shadow map filtering and after outputing exp(depth) to my depth texture rather than the normal depth, I need to run another pass to blur it before I do the lighting pass. Since it's just bluring a depth texture the fastest, least accurate alghorithm is fine I think. All I get from google is gaussian blur, but afaik that is unnecessary heavy?

Whats the cheapest blur alghorithms out there?

Advertisement
Box blur. It's the same as Gaussian but instead of each sample using a unique weight value, you sum all the samples together unweighted and the multiply the result by 1/numSamples.

Gaussian isn't that expensive BTW, as long as you precomputed the weights and hard -code them into the shader.

Box blur. It's the same as Gaussian but instead of each sample using a unique weight value, you sum all the samples together unweighted and the multiply the result by 1/numSamples.

Gaussian isn't that expensive BTW, as long as you precomputed the weights and hard -code them into the shader.

Something like this?


cbuffer BoxBlurConstants : register(CBUFFER_REGISTER_PIXEL)
{
    float2 gTexelSize;      // (1 / TEXTURE_WIDTH, 0) for horizontal pass, (0, 1 / TEXTURE_HEIGHT) for vertical pass
};

SamplerState gSampler : register(SAMPLER_REGISTER_POINT_CLAMP);

Texture2D gDepthTexture : register(TEXTURE_REGISTER_DEPTH);

const uint NUM_SAMPLES = 7;


float4 ps_main(float4 position : SV_Position) : SV_Target0
{
    float4 ret = float4(0.0);

    for (int i = -3; i <= 3; i++)
    {
        const float2 texCoord = (uint2)position.xy + i * gTexelSize;
        ret += gDepthTexture.Sample(gSampler, texCoord);
    }

    ret /= NUM_SAMPLES;

    return ret;
} 

And run it twice, once for horizontal and once for vertical

I got a few followup questions though

  1. I use CLAMP for my sampler, but what sampler filter should be used? bilinear, trilinear or anisotropic?
  2. How many samples should optimally be used?
  3. Isnt it very expensive? I was toying with MSAA before and as I understod it the texture fetches are what is expensive; and here I do 7 fetches per texel... how is this ever going to have a small performance footprint?

I got a few followup questions though

  1. I use CLAMP for my sampler, but what sampler filter should be used? bilinear, trilinear or anisotropic?
  2. How many samples should optimally be used?
  3. Isnt it very expensive? I was toying with MSAA before and as I understod it the texture fetches are what is expensive; and here I do 7 fetches per texel... how is this ever going to have a small performance footprint?

  1. Since you are sampling at exact pixel values, point sampling will be fine
  2. Up to you, try experimenting with different values and see what works best
  3. Worry about performance problems after they happen, not before
My current game project Platform RPG

I got a few followup questions though

  1. I use CLAMP for my sampler, but what sampler filter should be used? bilinear, trilinear or anisotropic?
  2. How many samples should optimally be used?
  3. Isnt it very expensive? I was toying with MSAA before and as I understod it the texture fetches are what is expensive; and here I do 7 fetches per texel... how is this ever going to have a small performance footprint?

  1. Since you are sampling at exact pixel values, point sampling will be fine
  2. Up to you, try experimenting with different values and see what works best
  3. Worry about performance problems after they happen, not before

As for 3), I am genuinly curious, since it is my understanding texture fetching is expensive, any additional insight is welcome

If fetching is the expense you are worried about then it would stand to reason that this would directly relate to your blur kernel size..smaller kernel requires less fetches. Also, some filters are separable, saving more on the cost of texel fetching.

As for 3), I am genuinly curious, since it is my understanding texture fetching is expensive, any additional insight is welcome

Memory operations, including texture operations tend be be expensive. Caches are designed to help speed up memory access to frequently read memory locations. A cache is a small amount of very fast memory that mirrors the value of the RAM, which is slower. If you read the same value over and over again in reads it from the cache, so repeated reads of the same pixel is faster after the first read. I am not expert on GPU hardware but I would imagine the cache will significantly speed up texture reads when blurring since nearby pixels will read from the same spots in memory. If your blur kernal gets too large then you run out of cache memory and you start to incur more cache misses. This will dramatically reduce performance. The performance characteristics of your blur shader depends on many factors, the best thing to do is to experiment with parameters to see what performs well and where the limits of the hardware are.

My current game project Platform RPG
If you use point filtering, sample at offset 0, +/-1, +/-2...
If you use bilinear filtering, you can instead use offsets 0, +/-1.5, +/-3.5... which gives you data from the pixels at offsets 0,1,2,3,4 with less fetch instructions :)

Yes it's expensive; Memory access is the slowest operation! But expense is relative -- you'd have to compare it against a forward rendering shader that computes 100 specular highlights to get expense in relative terms.

Typically you might see sample counts ranging from 5-15, but what's acceptible all depends on the GPU.
Also, if you are bottlenecked by these memory access times, then a certain amount of ALU processing may become "free", as the processors may have been idle waiting for data anyway. You might find that this box shader performs the same as your Gaussian one :)

If you want to perform larger and larger box filters, then instead of taking 100 Samples per pixel, you can switch algorithms. Auto-generated mipmaps typically are box-filtered, so you can copy your source into a mipped texture, call GenerateMip, etc, and then do a single SampleLOD per pixel.

Another alternative is Summed Area Tables, which allow any size bow blur to be completed in constant time.

think outside the box (blur), and make it diagonal, Ooooh Aaaah

1. Sorry for sounding like a broken record.. but is there any reason to even bother with samplers? What to gain? I can do blur fine with just the following:


cbuffer BoxBlurConstants : register(CBUFFER_REGISTER_PIXEL)
{
    // (1 / TEXTURE_WIDTH, 0) for horizontal pass, (0, 1 / TEXTURE_HEIGHT) for vertical pass
    float2 gTexelSize;
    float gTextureIndex;
};

SamplerState gSampler : register(SAMPLER_REGISTER_DEPTH_NO_COMPARE);
Texture2DArray gDepthTexture : register(TEXTURE_REGISTER_DEPTH);

static const uint NUM_SAMPLES = 7;


float ps_main(float4 position : SV_Position) : SV_Depth
{
    float ret = 0.0;

    for (int i = -3; i <= 3; i++)
    {
        const float2 texCoord = (uint2)position.xy + i;
        ret += gDepthTexture[float3(texCoord, gTextureIndex)].r;
    }

    ret /= NUM_SAMPLES;

    return ret;
}

2. What I dont understand is how my above code works in the pixels close to the edges. For example, for the top-most pixel it will try to sample from [-3, 3], with the negatives being outside of the texture, as the texture coordinates are [0, 1024] here. Using a sampler, I can clamp it but what will I fetch without using a sampler? Will it clamp by default?

This topic is closed to new replies.

Advertisement