Sign in to follow this  

fastest wide-width box filter

This topic is 3144 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

What is currently the fastest box filter technique? I'm looking to implement blurs that work well (fast) for radii of *50 pixels and beyond*. I'm familiar with the separable 2-pass methods but the number of texture samples the shader needs to take (e.g. >50) is not feasible here. I've also tried using multi-pass mip-mapping: using LOD bias and auto mip-map generation, and doing it several times. But the result doesn't look good enough. On the CPU, it is possible to have a linear time box filter (independent of width) by a 2-pass method, where a scanline 'accumulator' scans thru the X, and then Y axis, adding a new sample from the right and subtracting old sample from the left. Is it possible to implement this on the GPU? A naive implementation would be to have 2 FBOs: one w x 1 and one 1 x h. That'd require rendering w + h quads which I think is going to be too slow. For an example of the linear-time scanline accumulator method: http://incubator.quasimondo.com/processing/superfast_blur.php [Edited by - rexguo on May 7, 2009 5:10:22 AM]

Share this post


Link to post
Share on other sites
Pretty cool idea on the link you posted :)

I don't think this is possible on GPUs though, simply because each pixel gets processed indvidually and you don't have any control of how. The method on the link you posted requires processing the pixels in a specific order.

But, a 50-wide blur isn't going to have much detail anyway, so why not do it by scaling down the source texture a couple of times, the same way most bloom implementations are done.

Then, do a wide, separable blur on the smaller texture (for example a 16 by 16 wide blur). That way, you only spend 16+16 samples per pixel for the blurring process, plus the additional samples for the scaling.

For example, if you have a 1280x720 texture, scale it down to a 640x360, then again to 320x180. Each texel in the smaller texture is now the average of 4x4 texels in the source texture. Btw, if this texture isn't extremely smooth and looks like awesome, next-gen AA, you're doing the downscaling wrong ;)

It's probably better to do this downscaling yourself instead of using automatic mipmap generation. The driver is probably optimized for speed and not quality, so that may be why you don't think the downsampling is good enough. Do it manually instead, it will give you more control and the look you want.

A 16-wide separable blur on this small texture is now equivalent of doing a 64-wide blur on the source texture. Should be fast enough and be able to run even on old PS2.0 GPUs.

I implemented a bloom filter that did just this, with a 11x11 separable gaussian blur kernel, that ran very smooth even on an GeForce 6200 LE.

Best of luck!

/Simon

Share this post


Link to post
Share on other sites
Quote:
Original post by PolyVox
I've never implemented what you describe, but perhaps Summed Area Tables will be useful for you?


Thanks for that link! I spent 2 days implementing it
and it looks pretty decent. The only problem with it
is it requires 32-bit float FBOs to work correctly for
image sizes beyond 256 pixels. 32F pixels are still
very slow for most GPUs. The SAT approach's final
pass to get from the summed table to the final image
requires bilinear filtering of 32F float textures for
blur widths that are odd numbers. For a 640x480 image,
it takes about 40ms on a 8600M GT, which has about
12.8GB/s memory bandwidth.

.rex

Share this post


Link to post
Share on other sites
Quote:
Original post by simonjacoby
Pretty cool idea on the link you posted :)

I don't think this is possible on GPUs though, simply because each pixel gets processed indvidually and you don't have any control of how. The method on the link you posted requires processing the pixels in a specific order.


I think it might be possible to have a 'sliding
quad' in 2 passes: X and Y. Each pass in X or Y
will require N sub-passes where N is the blur width.
The sliding quad is basically an accumulator using
additive blend mode. Then to get the final image,
render one more time and divide the pixel value by
N^2.

I think I might give this method a try after the
SAT method I implemented (as replied above).


Quote:

But, a 50-wide blur isn't going to have much detail anyway, so why not do it by scaling down the source texture a couple of times, the same way most bloom implementations are done.


For what I'm doing, I need an accurate box filter
without sacrificing image quality.


Quote:

For example, if you have a 1280x720 texture, scale it down to a 640x360, then again to 320x180. Each texel in the smaller texture is now the average of 4x4 texels in the source texture. Btw, if this texture isn't extremely smooth and looks like awesome, next-gen AA, you're doing the downscaling wrong ;)


The problem with that approach is you can't get blur
widths that are not power-of-two, for example, 50-pix.
It's certainly good for eye candy effects like blooms
but too inaccurate for my use case.


Quote:

It's probably better to do this downscaling yourself instead of using automatic mipmap generation. The driver is probably optimized for speed and not quality, so that may be why you don't think the downsampling is good enough. Do it manually instead, it will give you more control and the look you want.


Are you saying there're drivers that are unable to
produce perfect 2x2 downsampled mipmaps? I'd be quite
surprised to know that.


.rex

Share this post


Link to post
Share on other sites

This topic is 3144 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this