Speeding Up Blur
Hello, I am doing Gaussian blur on the GPU several times to smooth an image for SSAO. The more blur, the better the image is looking. However, doing all these blurs is getting expensive, and my frame rate increases pretty significantly if I halve the number of blurs I do. So I am wonder if there is a way to get more bang for my buck per blur pass. Could choosing better blur weights help? I am already separating the blur into horizontal and vertical components.
In terms of 'getting bang for your buck' maybe collapsing all of your convolutions into one large convolution would help. For example, doing m length n convolutions can be reduced to doing one large convolution with length m*(n-1). That doesn't seem like much, but it simplifies the code and would be a big savings if m is large.
In addition, however, simplifying it to a single convolution will really help you in the long term, because if the convolution is truely large, you might find out that it is more efficient to exploit CUFFT and perform the convolution using the fft. If the image dimension is length q, and your long convolution is length p, standard convolution is an O(q p) operation. However, by exploiting the convolution theorem, it is possible to perform the convolution in O(q log (q)) time, which could be significantly faster. On my informal benchmarks, performing convolution in the frequency domain became faster when p > 13, and it sounds like your p is much larger.
In addition, however, simplifying it to a single convolution will really help you in the long term, because if the convolution is truely large, you might find out that it is more efficient to exploit CUFFT and perform the convolution using the fft. If the image dimension is length q, and your long convolution is length p, standard convolution is an O(q p) operation. However, by exploiting the convolution theorem, it is possible to perform the convolution in O(q log (q)) time, which could be significantly faster. On my informal benchmarks, performing convolution in the frequency domain became faster when p > 13, and it sounds like your p is much larger.
Well you can always try to use SAT - but that will end in box filters, which aren't nice :(
My advice is to avoid gaussian blurring lots of times and perform something called bilateral filtering on SSAO:
Here is original paper http://www.cs.duke.edu/~tomasi/papers/tomasi/tomasiIccv98.pdf
My advice is to avoid gaussian blurring lots of times and perform something called bilateral filtering on SSAO:
Here is original paper http://www.cs.duke.edu/~tomasi/papers/tomasi/tomasiIccv98.pdf
Also you can optimize your blur passes a bit. First make sure the target is as small as it can be. Usually half-size of screen resolution is ok for blurring.
Also, you may want to turn off all color writes besides red, and use the red channel for all the data, because usually SSAO uses a single channel anyway.
Also, you may want to turn off all color writes besides red, and use the red channel for all the data, because usually SSAO uses a single channel anyway.
Take advantage of the bilinear filter -- by placing your samples on the border between two pixels, you'll be sampling two pixels with one fetch.
You can also adjust these offsets to reduce the amount of weighting math you have to do (i.e. have the bilinear filter do the weighting for you).
You can also do a strided blur, where you space your samples out and skip a few pixels, which gives you a wider radius for less cost.
You can also adjust these offsets to reduce the amount of weighting math you have to do (i.e. have the bilinear filter do the weighting for you).
You can also do a strided blur, where you space your samples out and skip a few pixels, which gives you a wider radius for less cost.
calculate all the texcoords in the vertex shader
so the gpu can start the sampling bef1ore the pixel shaders even run
that avoids dependent texture read, don't know if it makes any difference
so the gpu can start the sampling bef1ore the pixel shaders even run
Output.Position = float4(Input.Position.xy, 0.5, 1);Output.TexCoord = Input.Position.xy * float2(0.5, -0.5) + 0.5;Output.Blur0 = Output.TexCoord + PixSize * float2(0, -3);Output.Blur1 = Output.TexCoord + PixSize * float2(0, -2);Output.Blur2 = Output.TexCoord + PixSize * float2(0, -1);Output.Blur3 = Output.TexCoord + PixSize * float2(0, 0);Output.Blur4 = Output.TexCoord + PixSize * float2(0, +1);Output.Blur5 = Output.TexCoord + PixSize * float2(0, +2);Output.Blur6 = Output.TexCoord + PixSize * float2(0, +3);
that avoids dependent texture read, don't know if it makes any difference
color += tex2D(smpBlurH, Blur0) * Weight[0];color += tex2D(smpBlurH, Blur1) * Weight[1];color += tex2D(smpBlurH, Blur2) * Weight[2];color += tex2D(smpBlurH, Blur3) * Weight[3];color += tex2D(smpBlurH, Blur4) * Weight[4];color += tex2D(smpBlurH, Blur5) * Weight[5];color += tex2D(smpBlurH, Blur6) * Weight[6];
Split your 2D blur into 2 1D blurs. That way you don't need to sample any diagonal pixels at all.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement