Made a (looong) GLSL ComputeShader for Tiled-Deferred rendering. On my laptop with a 2013 nVidia graphics card, it works fine. But now I'm sending the program to some other guys. And as you know, that's always where the headaches start Can't debug or whatsoever, only guess. I need your experience or guessing-powers to give me some directions!
Guy1 had a nVidia card. Not too old, certainly not new either. Video card driver hanged / crashed when running this particular ComputeShader. Disabling loops "fixed" it:
requires OpenGL 430
#define TILE_SIZE 32
layout (local_size_x = TILE_SIZE, local_size_y = TILE_SIZE) in;
shared uint _indxLightsPoint[ MAXLIST_LIGHTS_POINT ]; // Found PointLights, indexes to UBO lightArray
...
// 1. Let each pixel inside a tile check ONE pointlight, see if it intersects tile-frustum. Ifso, add to a shared list
uint thrID = gl_LocalInvocationID.x + gl_LocalInvocationID.y * TILE_SIZE; // Each tasks gets a number (0,1,2, ... 1023)
if ( thrID < counts1.x ) { // "count1" comes from a UBO parameter. Would be "2", if there were 2 active lights in the scene
if ( pointLightIntersects( tileFrustum, lightPoint[ thrID ].posRange ) ) {
// Add lightIndex to list
uint index = atomicAdd( _cntLightsPoint, 1 );
if ( index < MAXLIST_LIGHTS_POINT )
_indxLightsPoint[ index ] = thrID;
}
}
...
barrier();
...
// 2. Loop through the lights we found
for (uint i=0; i < _cntLightsPoint; i++) {
uint index = _indxLightsPoint[i];
addPointLight( brdf, surf, lightPoint[index] );
} // for
Compiles, starts, hangs the video-driver. If I simplify all this code to a fixed " addPointLight( ... lightPoint[ 0 ] )", it works. And a damn lot faster as well (even though I only had 1 or 2 lights in the scene anyway). If I re-enable "barrier" or some of the atomic operations, the FPS crumbles again. My first thought was that the "FOR LOOP" went crazy, counting to an extreme high number. But even if I put a hard-coded number here, it still crashes. The other suspect might be an out-of-range array read, but I can't see how.
Could it be that "older" cards (2010..2012) have issues with (GLSL) Barriers or Atomic operations? Or maybe the hard-coded Tilesize (32x32) is too big? Although I would expect a compiler crash in that case.
Guy1 now has a new AMD card. But it seems it doesn't support some OpenGL 4.5.0 features (though all shaders use 430). Got stranded after that.
Guy2 had a 2011 nVidia card, don't know what exactly. Everything works, but graphics seem more blurry (anisotropic / mipmapping settings?). Moroever, framerate is horrible. Mines is ~50 .. 60 FPS at a larger resolution, his is 5. I expected a drop, but not that much. As usual there could be a billio things wrong, but my main suspects are:
- ComputeShader setup (tilesize 32x32 too big)
- ComputeShader operations (atomicAdd / atomicMin / atomicMax / FOR LOOP / Barrier )
- I assume 24+ texture units are available ( ie "layout(binding=20) uniform sampler2D gBufferXYZ;" ). I know older cards only have 16 or so. But again I would expect a crash then.
- Not using glMemoryBarrier( GL_ALL_BARRIER_BITS ); (properly), prior or after calling the CS
My guts say to replace the ComputeShader with good old Fragment shaders and such. Then again it just works well on my own computer. And since its quite a job to change, it would suck if something very different turns out to be the party-crasher.
Ciao!