Efficient way to handle multiple texture fetches in shader?

Started by
4 comments, last by RDragon1 11 years, 6 months ago
Hi!,

I have a fragment shader program, which has to sample a grid of texels (about 16x16) around the center texel of a texture. This happens for every texel in the texture.
As expected these many texture fetches per fragment affects the performance of the program. I was wondering if there are any ways to optimize these fetches for better performance.

I understand that if the texels fetched are used for a simple linear weighting operation(such as a gaussian filter), one can do lesser number of fetches and by using of GL_LINEAR sampling and sampling in between two texels rather than at the actual positions. But are there any other methods for operations more complicated than weighted sums?
Advertisement
For separable filters such as gauss or box filters you can get away with n+m fetches instead of n*m but that also requires two stages. Summed area tables might also be of interest. Also explicitly fetching from lower mip-map levels might be of use.
Since you make specific mention of a guassian filter, I'll throw in $0.02 cents and add that guassian operations (specifically gaussian blur) in two dimensions can be optimized by rendering only one dimension per pass, and performing two passes. The number of texture lookups is significantly reduced, and the total effect is identical. It certainly helps with large kernel sizes. I only use this for gaussian blur, but there are several effects (I believe) that benefit from this method.
Thanks!
I'd already looked into separable filters. But the other two methods seem promising, especially summed area tables, which i think is the same as Integral images. Also I was wondering, since the texture fetches are offset and the same fetches will be repeated for every texels is there any caching technique we can make use of. (I agree its tough because the operations are happening in parallel)
The same question was posted by somebody on stackoverflow recently. The top answer also seems to be an interesting suggestion, basically make sure that for any dependent texture reads, calculate the coordinates in the vertex shader rather than the fragment shader. This allows the GPU to optimize texture fetches in the fragment shader by caching etc.

http://stackoverflow...rociously-slow/
If possible, you can use a compute shader's shared memory to reduce the number of fetches you have to do per-fragment. If adjacent fragments fetch a large number of the same texels, you can store these in shared memory, and when moving from one fragment to the next, only fetch the additional needed texels for the new fragment, discarding the texels that the previous fragment needed but the current one doesn't. That way you minimize the amount of re-fetching. There are examples of doing blurs with this technique out there.

This topic is closed to new replies.

Advertisement