There are many situations were I need to process an array / grid / matrix of pixels in a loop in a post process, mostly for calculating some mean or maximum value or a gradient of a certain set of adjacent pixels. Examples:
- Gaussian blur filter
- Calculating mean luminance of a specific area on the screen
- Getting pixel nearest to the viewer within a certain range
- Checking if at least one pixel of a 2-dimensional array (rectangle) of pixels meets a certain condition
- etc.
All these applications are typically processed in a 2-dimensional for-loop iterating through the x- and y-axis (rows and columns) of the specified grid, where each iteration makes at least one texture lookup. Therefor they cause heavy GPU load as soon as their dimensions exceed a certain limit, making FPS drop dramatically in many situations.
So, are there strategies to speedup execution times immensely to overcome this limitations? I've already tried these:
- Either use [loop] or [unroll] attributes
- Use tex2Dlod instead of tex2D in pixel shader
- Wherever possible, make use of these hints: http://msdn.microsoft.com/en-gb/library/windows/desktop/cc627119%28v=vs.85%29.aspx
But that's way not enough to make my shaders run well. Are there any better approaches, or maybe some special DX10/11 instructions being of help here?