Speed up loop, texture lookup, and array processing in pixel shader / post process (HLSL only)

Started by
5 comments, last by Meltac 11 years, 5 months ago
Hi everybody

There are many situations were I need to process an array / grid / matrix of pixels in a loop in a post process, mostly for calculating some mean or maximum value or a gradient of a certain set of adjacent pixels. Examples:


  • Gaussian blur filter
  • Calculating mean luminance of a specific area on the screen
  • Getting pixel nearest to the viewer within a certain range
  • Checking if at least one pixel of a 2-dimensional array (rectangle) of pixels meets a certain condition
  • etc.


All these applications are typically processed in a 2-dimensional for-loop iterating through the x- and y-axis (rows and columns) of the specified grid, where each iteration makes at least one texture lookup. Therefor they cause heavy GPU load as soon as their dimensions exceed a certain limit, making FPS drop dramatically in many situations.

So, are there strategies to speedup execution times immensely to overcome this limitations? I've already tried these:


  1. Either use [loop] or [unroll] attributes
  2. Use tex2Dlod instead of tex2D in pixel shader
  3. Wherever possible, make use of these hints: http://msdn.microsoft.com/en-gb/library/windows/desktop/cc627119%28v=vs.85%29.aspx


But that's way not enough to make my shaders run well. Are there any better approaches, or maybe some special DX10/11 instructions being of help here?
Advertisement
You need to change the filter logic, e.g. use filter separation to split up your filter into two passes (instead of n*n you have only n+n texture lookups).

Then try to utilize the hardware linear filters, that is you can access with a single texture lookup the linear sum of 4 texels or atleast 2 texels, this is very useful for blur filters.

With this simple tricks you can reduce a naive blur filter implemenation (9x9=81 texture lookup) to just 10 lookups.
Thanks for the hints. Unfortunately more than one pass is not an option in my case (engine limitation).

How do I use the hardware linear filter? Do you have an example in HLSL for me?
Here's a simple example.

You want to add a weighted sum of texels (gaussian blur):
w1*t1+w2*t2+...
Now you can rewrite it to
(w1/(w1+w2)*t1) + (w2/(w1+w2) t2) *(w1+w2)
Now you can rewrite
w2/(w1+w2)
<=> (w1+w2-w1)/(w1+w2)
<=>(w1+w2)/(w1+w2) - w1//(w1+w2)
<=>1 - w1//(w1+w2)
Now substitue w1//(w1+w2) with alpha and you get
((alpha*t1) + (1-alpha)* t2) *(w1+w2)
Now, (alpha*t1) + (1-alpha)* t2 is a linear combination of two texels depending on alpha which is the standard hardware linear filter when accessing a value between two texels. You just need to multiply the resulting texel with (w1+w2).
Voila, two texel with 1 access. As homework you can try to do the same with accessing 4 pixels (x and y).:D
Urgh, I would have hoped for something like a simple function call like

float3 mean_color = tex2Dlinear(myTextureSampler, uv).rgb

or

sampler2D gSampler =
sampler_state
{
Texture = <gTexture>;
MinFilter = Linear;
MagFilter = Linear;
MipFilter = Linear;
};

So do I have to do this by hand?
With newer hardware you can gather 4 texels at once (glsl gather, I'm sure there's a hlsl equivalent), but.... to really optimize a shader or filter you need to do it by hand.


So do I have to do this by hand?

All the professional engines like UDK,CE,Unity are doing crazy stuff while keeping performance.. well, this is for sure not because of using command X instead of Y tongue.png
Ok, thank you so much!

This topic is closed to new replies.

Advertisement