GLSL shader SLOW when having too big arrays

Started by
2 comments, last by cgrant 6 years, 12 months ago

Hello, so I have done shader that has to pick texture from atlas and do repeat or mirrored repeat. Nothing else. The texture param is passed as attribute. Then there is an const array that has 4*(count of textures) size, and there is starting xy and size xy for each one. Technically it works fine. Also, on Nvidia 750 it works fast.

But, on Nvidia 820 (worse) there is slowdown so brutal that I got TDR! :D So I tried to do a reduced version of shader, just with few textures. Array of 128 worked, lets say, on normal speed. I added next 4 parts of array as separate arrays (I have 128 textures, so it was 512 originally). Speed was OK. But when I started to use them and switch them like - for 0-31 use first, for 32-63 second and so, it started to be again SLOW! So my guess is that compiler just optimized out unused variables before.

My second guess is that while Nvidia 750 allocates const variables in shaders only once, Nvidia 750 does it on every run. But this is just my guess. Also, I know that ifs in shaders are slow, but certainly not that slow!

The code for original shader that was working on 750 but is slow on 820:


attribute float param;
attribute vec4 barva;

varying vec4 myUniform;
uniform sampler2D texture;

varying vec2 uv_coords;

varying vec3 vertex_light_position;

   const float tex_data[508] = float[508](
      1.,1.,256.,256.,
	...
      0.,0.,1.,1.
);

void main()
{
	float nasobek=2.0;
	float druhy_nasobek = 1.0;
	float prevrat = -1.0;
	
	float odsunuti =2.0;
	float odsunuti2 =4.0;
        
        int texnum = round(param);
        int num1 = (texnum*4);
        int num2 = (texnum*4)+1;
        int num3 = (texnum*4)+2;
        int num4 = (texnum*4)+3;

        myUniform.x=(tex_data[num1]+odsunuti)/4096.;myUniform.y=(tex_data[num2]+odsunuti)/4096.;
	myUniform.z=(tex_data[num3]-odsunuti2)/4096.;myUniform.w=(tex_data[num4]-odsunuti2)/4096.;

	uv_coords.xy=fract(uv_coords.xy);
	if(texnum<67)
	{
         if(mod(uv_coords.x,1.0)>0.5)
         {
           uv_coords.x=druhy_nasobek*(1.0-uv_coords.x);
         }
         if(mod(uv_coords.y,1.0)>0.5)
         {
           uv_coords.y=druhy_nasobek*(1.0-uv_coords.y);
         }
	}
	
	uv_coords.xy*=(myUniform.zw)*nasobek;
	uv_coords.xy = mod(uv_coords.xy, myUniform.zw);
        uv_coords.xy += myUniform.xy;

        float diffuse_value = 20.0*(gl_DepthRange.far-gl_FragCoord.z);

	vec4 color = texture2D(texture, uv_coords);
	gl_FragColor = color*barva;//* diffuse_value;
}	

Specs of both gpus:

https://www.notebookcheck.net/NVIDIA-GeForce-820M.108477.0.html

https://www.notebookcheck.net/NVIDIA-GeForce-GT-750M.90245.0.html

With which spec this has something to do? I guess there must be some better way how to load those consts. Please help?

Advertisement

SOLVED! I have split the big arrays on 128, but difference is that I have one array for X, one for Y, one for Xsize, and one for Ysize - I am not doing any 32-texture pages like before so there are no ifs needed.

My previous statement that ifs cannot slowdown that much, was wrong. One else-if pair reduced the speed from 73fps to 36fps. With 3 else-ifs I had just 3fps! Now, when they are not needed, my speed is OK. Still I am not satisfied, cause this means that I am limited to 128 textures. I guess the RIGHT solution is somehow different...

Instead of using a "const float" array, try storing your data in a UBO or TBO.

Actual constant data will be implemented by instructions like:
set register0 to 1.0f
set register1 to 1.0f
set register2 to 255.0f
...
Which is a lot of instructions to add onto the front of your shader!
Also, you've got around 2KiB of constants here, which probably doesn't fit in the registers, so the compiler would have to do something even worse than this!

OpenGL is abstract though, so there's many ways the driver can compile your code. If you're lucky, it will internally make a hidden UBO comtaining your constants and bind it for you -- perhaps one of your GPU/driver combos was doing this, and hence why it was faster.

Instead of relying on driver magic, try storing the data in some different ways yourself and see it it affects the performance.

Just to add to what Hodgman said, you cannot treat the GPU like a general purpose CPU as you will quickly run into some limit which are imposed for very good reason. If you data is not in buffers, then changes are they have to be stored in registers which are a limited commodity on any hardware. Long story short, take a real deep look at what you are trying to accomplish and map that to a GPU paradigm instead of a CPU one.

This topic is closed to new replies.

Advertisement