Jump to content
  • Advertisement
Sign in to follow this  
Macin2

GLSL shader SLOW when having too big arrays

This topic is 432 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello, so I have done shader that has to pick texture from atlas and do repeat or mirrored repeat. Nothing else. The texture param is passed as attribute. Then there is an const array that has 4*(count of textures) size, and there is starting xy and size xy for each one. Technically it works fine. Also, on Nvidia 750 it works fast.

But, on Nvidia 820 (worse) there is slowdown so brutal that I got TDR! :D So I tried to do a reduced version of shader, just with few textures. Array of 128 worked, lets say, on normal speed. I added next 4 parts of array as separate arrays (I have 128 textures, so it was 512 originally). Speed was OK. But when I started to use them and switch them like - for 0-31 use first, for 32-63 second and so, it started to be again SLOW! So my guess is that compiler just optimized out unused variables before.

My second guess is that while Nvidia 750 allocates const variables in shaders only once, Nvidia 750 does it on every run. But this is just my guess. Also, I know that ifs in shaders are slow, but certainly not that slow!

The code for original shader that was working on 750 but is slow on 820:

attribute float param;
attribute vec4 barva;

varying vec4 myUniform;
uniform sampler2D texture;

varying vec2 uv_coords;

varying vec3 vertex_light_position;

   const float tex_data[508] = float[508](
      1.,1.,256.,256.,
	...
      0.,0.,1.,1.
);

void main()
{
	float nasobek=2.0;
	float druhy_nasobek = 1.0;
	float prevrat = -1.0;
	
	float odsunuti =2.0;
	float odsunuti2 =4.0;
        
        int texnum = round(param);
        int num1 = (texnum*4);
        int num2 = (texnum*4)+1;
        int num3 = (texnum*4)+2;
        int num4 = (texnum*4)+3;

        myUniform.x=(tex_data[num1]+odsunuti)/4096.;myUniform.y=(tex_data[num2]+odsunuti)/4096.;
	myUniform.z=(tex_data[num3]-odsunuti2)/4096.;myUniform.w=(tex_data[num4]-odsunuti2)/4096.;

	uv_coords.xy=fract(uv_coords.xy);
	if(texnum<67)
	{
         if(mod(uv_coords.x,1.0)>0.5)
         {
           uv_coords.x=druhy_nasobek*(1.0-uv_coords.x);
         }
         if(mod(uv_coords.y,1.0)>0.5)
         {
           uv_coords.y=druhy_nasobek*(1.0-uv_coords.y);
         }
	}
	
	uv_coords.xy*=(myUniform.zw)*nasobek;
	uv_coords.xy = mod(uv_coords.xy, myUniform.zw);
        uv_coords.xy += myUniform.xy;

        float diffuse_value = 20.0*(gl_DepthRange.far-gl_FragCoord.z);

	vec4 color = texture2D(texture, uv_coords);
	gl_FragColor = color*barva;//* diffuse_value;
}	

Specs of both gpus:

https://www.notebookcheck.net/NVIDIA-GeForce-820M.108477.0.html

https://www.notebookcheck.net/NVIDIA-GeForce-GT-750M.90245.0.html

 

With which spec this has something to do? I guess there must be some better way how to load those consts. Please help?

Edited by Macin2

Share this post


Link to post
Share on other sites
Advertisement

SOLVED! I have split the big arrays on 128, but difference is that I have one array for X, one for Y, one for Xsize, and one for Ysize - I am not doing any 32-texture pages like before so there are no ifs needed.

My previous statement that ifs cannot slowdown that much, was wrong. One else-if pair reduced the speed from 73fps to 36fps. With 3 else-ifs I had just 3fps! Now, when they are not needed, my speed is OK. Still I am not satisfied, cause this means that I am limited to 128 textures. I guess the RIGHT solution is somehow different...

Share this post


Link to post
Share on other sites
Instead of using a "const float" array, try storing your data in a UBO or TBO.

Actual constant data will be implemented by instructions like:
set register0 to 1.0f
set register1 to 1.0f
set register2 to 255.0f
...
Which is a lot of instructions to add onto the front of your shader!
Also, you've got around 2KiB of constants here, which probably doesn't fit in the registers, so the compiler would have to do something even worse than this!

OpenGL is abstract though, so there's many ways the driver can compile your code. If you're lucky, it will internally make a hidden UBO comtaining your constants and bind it for you -- perhaps one of your GPU/driver combos was doing this, and hence why it was faster.

Instead of relying on driver magic, try storing the data in some different ways yourself and see it it affects the performance.

Share this post


Link to post
Share on other sites

Just to add to what Hodgman said, you cannot treat the GPU like a general purpose CPU as you will quickly run into some limit which are imposed for very good reason. If you data is not in buffers, then changes are they have to be stored in registers which are a limited commodity on any hardware. Long story short, take a real deep look at what you are trying to accomplish and map that to a GPU paradigm instead of a CPU one.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!