I have a fragment shader program, which has to sample a grid of texels (about 16x16) around the center texel of a texture. This happens for every texel in the texture.
As expected these many texture fetches per fragment affects the performance of the program. I was wondering if there are any ways to optimize these fetches for better performance.
I understand that if the texels fetched are used for a simple linear weighting operation(such as a gaussian filter), one can do lesser number of fetches and by using of GL_LINEAR sampling and sampling in between two texels rather than at the actual positions. But are there any other methods for operations more complicated than weighted sums?