Sign in to follow this  
Tocs1001

OpenGL Shader works on laptop, hangs on desktop.

Recommended Posts

Tocs1001    695

I recently completed my Single Pass Order Independent Transparency shader.

 

I decided to add some shadows to my lighting shader. Since my lighting is computed with Tiled Forward Shading I put my shadow maps into texture array of cube maps. When I added the line of code to sample the cube maps. It started to hang on glDrawElements(). After some time the graphics driver kills the program for having too many errors. However it doesn't hang right away, there's a couple seconds of it working correctly before it breaks. 

 

I gave it a try on my laptop (NV 630m) and it works completely, and seemingly smoother than my desktop(NV 770) without the shadows.

 

If I comment out the line sampling the cube map array for shadows it works.

attenuation *= texture(ShadowMaps, vec4(WL,shadow),comparedepth); 

Curiously if I leave the line sampling the shadows, don't call my BRDF portion, and instead output attenuation. The shader doesn't hang.

color += vec4(attenuation,attenuation,attenuation,0.1);

What that looks like:

 

eePlhJI.png

 

I've pasted my shaders here. http://pastie.org/8624432#85,92 Since they're kind of large.

 

And an opengl log, though CodeXL doesn't seem to want to capture the whole log for a single frame.

https://gist.github.com/LordTocs/c2a59de6c3d9fa811d2b

 

I'm hoping it's not the drivers because I tried updating them to the latest. I hope someone spots something I'm doing incorrectly. I know it's a lot to sift through but I'm running out of ideas.

 

Screenshot from my laptop: http://i.imgur.com/9WspPLc.png

 

EDIT:

 

I've since added a debug callback to my OpenGL context. When the shader locks up, this comes over the debug output.

 

Debug(api, m): PERFORMANCE - Program/shader state performance warning: Fragment Shader is going to be recompiled because the shader key based on GL state mismatches.

 

Edited by Tocs

Share this post


Link to post
Share on other sites
SeanMiddleditch    17565

I've never seen a hang as a result of it, but this is undefined behavior (at least in older versions of GL/D3D; maybe it's been changed in recent versions and I missed the news):

int lightCount = LightCountAndOffsets[index].x; // index is non-uniform so lightCount is non-uniform
int lightOffset = LightCountAndOffsets[index].y;

vec4 color = vec4(0.0, 0.0, 0.0, 0.0);

ShadePrep ();
for (int i = 0; i < lightCount; ++i) // non-uniform loop is slow
{
    int lightIndex = texelFetch(LightIndexLists, lightOffset + i).x; // texture read in non-uniform flow is undefined behavior; don't do this

Even if it were well-defined behavior, the non-uniform loop will be inefficient.  Remember how GPU hardware works: multiple 'threads' are running simultaneously using the same instruction pointer in blocks of 4-32 (or more).  If only some threads are executing conditional code then the other threads are basically sitting idle (though usually they're still executing all those instructions but ignoring the results).  You can't always avoid non-uniform flow but you should keep it to small if blocks and avoid large conditional blocks or loops as much as you possibly can.  Some algorithms just aren't suited for today's GPUs.  Algorithms with lots of conditions are sometimes better executed when broken into multiple passes (with different passes for different cases).

 

In particular, this is why texture reads in non-uniform loops were made undefined behavior.  All the threads might execute that instruction whether they're supposed to or not; the result is just ignored for threads that aren't supposed to be running that code.  Accessing textures is slow (so non-uniform texture access can really hurt) and accessing textures with possibly bogus data may do any number of things (modern hardware should robust and cope with it... should be).

 

You should also try reducing this to a minimal test case that triggers the behavior.  It would be easier to comb through your code then or to file a bug report with your HW vendor if the code seems correct.

Share this post


Link to post
Share on other sites
Kaptein    2224

It probably didn't even compile on desktop, so you may just be rendering with fixed function pipeline without knowing it, if the shader is invalid.

 

"If program does not
        contain shader objects of type GL_FRAGMENT_SHADER, an
        executable will be installed on the vertex, and possibly geometry processors,
        but the results of fragment shader execution will be undefined."

 

Do you have a robust shader loader? Errors can happen while compiling each shader separately (vertex + fragment,) as well as when linking the program.

So, in total there are 3 possible error scenarios, and thus 3 separate instances of where you would have to query for the "infolog."

 

Though, I have had it happen several times where the shader compiled, linked, and was clearly not working.

In those cases it could be many things, such as invalid square root parameters, normalization of zero-length vector etc.

Edited by Kaptein

Share this post


Link to post
Share on other sites
Tocs1001    695

Well the loops aren't entirely slow. They're broken up by 32x32 tile. So spatially pixels in the same area are using the same list of lights. So it has the potential for the same warp/wavefront (I think those are the words NV/AMD respectively) to be using the same list. 

 

http://www.cse.chalmers.se/~olaolss/papers/tiled_shading_preprint.pdf

 

Though when I looked up the undefined-ness of texture sampling. You were correct. However, there are some texture samplings that are ok, mainly ones that don't rely on the computation of mip-maps or filtering. Because I'm using texelFetch() it should be defined... I think. (http://www.opengl.org/wiki/Sampler_(GLSL)#Non-uniform_flow_control)

 

Which points out in the sampling of shadow maps is non-uniform and I'm relying on the filtering. Perhaps it's the cause of the issue, though I'm not really sure how to fix it.

 

Thank's for your input.

 

EDIT: Missed the new post by Kaptein.

 

My shader loader checks for compile errors on every shader. And link errors on every program. If anything comes up it prints the result and asserts(false); So I can see the error. It's compiling. I just didn't include the vertex shader as it didn't seem related and there was already an enormous bit of code to look through. The vertex shader also isn't doing anything interesting.

Edited by Tocs

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Similar Content

    • By Zaphyk
      I am developing my engine using the OpenGL 3.3 compatibility profile. It runs as expected on my NVIDIA card and on my Intel Card however when I tried it on an AMD setup it ran 3 times worse than on the other setups. Could this be a AMD driver thing or is this probably a problem with my OGL code? Could a different code standard create such bad performance?
    • By Kjell Andersson
      I'm trying to get some legacy OpenGL code to run with a shader pipeline,
      The legacy code uses glVertexPointer(), glColorPointer(), glNormalPointer() and glTexCoordPointer() to supply the vertex information.
      I know that it should be using setVertexAttribPointer() etc to clearly define the layout but that is not an option right now since the legacy code can't be modified to that extent.
      I've got a version 330 vertex shader to somewhat work:
      #version 330 uniform mat4 osg_ModelViewProjectionMatrix; uniform mat4 osg_ModelViewMatrix; layout(location = 0) in vec4 Vertex; layout(location = 2) in vec4 Normal; // Velocity layout(location = 3) in vec3 TexCoord; // TODO: is this the right layout location? out VertexData { vec4 color; vec3 velocity; float size; } VertexOut; void main(void) { vec4 p0 = Vertex; vec4 p1 = Vertex + vec4(Normal.x, Normal.y, Normal.z, 0.0f); vec3 velocity = (osg_ModelViewProjectionMatrix * p1 - osg_ModelViewProjectionMatrix * p0).xyz; VertexOut.velocity = velocity; VertexOut.size = TexCoord.y; gl_Position = osg_ModelViewMatrix * Vertex; } What works is the Vertex and Normal information that the legacy C++ OpenGL code seem to provide in layout location 0 and 2. This is fine.
      What I'm not getting to work is the TexCoord information that is supplied by a glTexCoordPointer() call in C++.
      Question:
      What layout location is the old standard pipeline using for glTexCoordPointer()? Or is this undefined?
       
      Side note: I'm trying to get an OpenSceneGraph 3.4.0 particle system to use custom vertex, geometry and fragment shaders for rendering the particles.
    • By markshaw001
      Hi i am new to this forum  i wanted to ask for help from all of you i want to generate real time terrain using a 32 bit heightmap i am good at c++ and have started learning Opengl as i am very interested in making landscapes in opengl i have looked around the internet for help about this topic but i am not getting the hang of the concepts and what they are doing can some here suggests me some good resources for making terrain engine please for example like tutorials,books etc so that i can understand the whole concept of terrain generation.
       
    • By KarimIO
      Hey guys. I'm trying to get my application to work on my Nvidia GTX 970 desktop. It currently works on my Intel HD 3000 laptop, but on the desktop, every bind textures specifically from framebuffers, I get half a second of lag. This is done 4 times as I have three RGBA textures and one depth 32F buffer. I tried to use debugging software for the first time - RenderDoc only shows SwapBuffers() and no OGL calls, while Nvidia Nsight crashes upon execution, so neither are helpful. Without binding it runs regularly. This does not happen with non-framebuffer binds.
      GLFramebuffer::GLFramebuffer(FramebufferCreateInfo createInfo) { glGenFramebuffers(1, &fbo); glBindFramebuffer(GL_FRAMEBUFFER, fbo); textures = new GLuint[createInfo.numColorTargets]; glGenTextures(createInfo.numColorTargets, textures); GLenum *DrawBuffers = new GLenum[createInfo.numColorTargets]; for (uint32_t i = 0; i < createInfo.numColorTargets; i++) { glBindTexture(GL_TEXTURE_2D, textures[i]); GLint internalFormat; GLenum format; TranslateFormats(createInfo.colorFormats[i], format, internalFormat); // returns GL_RGBA and GL_RGBA glTexImage2D(GL_TEXTURE_2D, 0, internalFormat, createInfo.width, createInfo.height, 0, format, GL_FLOAT, 0); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST); DrawBuffers[i] = GL_COLOR_ATTACHMENT0 + i; glBindTexture(GL_TEXTURE_2D, 0); glFramebufferTexture(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0 + i, textures[i], 0); } if (createInfo.depthFormat != FORMAT_DEPTH_NONE) { GLenum depthFormat; switch (createInfo.depthFormat) { case FORMAT_DEPTH_16: depthFormat = GL_DEPTH_COMPONENT16; break; case FORMAT_DEPTH_24: depthFormat = GL_DEPTH_COMPONENT24; break; case FORMAT_DEPTH_32: depthFormat = GL_DEPTH_COMPONENT32; break; case FORMAT_DEPTH_24_STENCIL_8: depthFormat = GL_DEPTH24_STENCIL8; break; case FORMAT_DEPTH_32_STENCIL_8: depthFormat = GL_DEPTH32F_STENCIL8; break; } glGenTextures(1, &depthrenderbuffer); glBindTexture(GL_TEXTURE_2D, depthrenderbuffer); glTexImage2D(GL_TEXTURE_2D, 0, depthFormat, createInfo.width, createInfo.height, 0, GL_DEPTH_COMPONENT, GL_FLOAT, 0); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST); glBindTexture(GL_TEXTURE_2D, 0); glFramebufferTexture(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, depthrenderbuffer, 0); } if (createInfo.numColorTargets > 0) glDrawBuffers(createInfo.numColorTargets, DrawBuffers); else glDrawBuffer(GL_NONE); if (glCheckFramebufferStatus(GL_FRAMEBUFFER) != GL_FRAMEBUFFER_COMPLETE) std::cout << "Framebuffer Incomplete\n"; glBindFramebuffer(GL_FRAMEBUFFER, 0); width = createInfo.width; height = createInfo.height; } // ... // FBO Creation FramebufferCreateInfo gbufferCI; gbufferCI.colorFormats = gbufferCFs.data(); gbufferCI.depthFormat = FORMAT_DEPTH_32; gbufferCI.numColorTargets = gbufferCFs.size(); gbufferCI.width = engine.settings.resolutionX; gbufferCI.height = engine.settings.resolutionY; gbufferCI.renderPass = nullptr; gbuffer = graphicsWrapper->CreateFramebuffer(gbufferCI); // Bind glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fbo); // Draw here... // Bind to textures glActiveTexture(GL_TEXTURE0); glBindTexture(GL_TEXTURE_2D, textures[0]); glActiveTexture(GL_TEXTURE1); glBindTexture(GL_TEXTURE_2D, textures[1]); glActiveTexture(GL_TEXTURE2); glBindTexture(GL_TEXTURE_2D, textures[2]); glActiveTexture(GL_TEXTURE3); glBindTexture(GL_TEXTURE_2D, depthrenderbuffer); Here is an extract of my code. I can't think of anything else to include. I've really been butting my head into a wall trying to think of a reason but I can think of none and all my research yields nothing. Thanks in advance!
    • By Adrianensis
      Hi everyone, I've shared my 2D Game Engine source code. It's the result of 4 years working on it (and I still continue improving features ) and I want to share with the community. You can see some videos on youtube and some demo gifs on my twitter account.
      This Engine has been developed as End-of-Degree Project and it is coded in Javascript, WebGL and GLSL. The engine is written from scratch.
      This is not a professional engine but it's for learning purposes, so anyone can review the code an learn basis about graphics, physics or game engine architecture. Source code on this GitHub repository.
      I'm available for a good conversation about Game Engine / Graphics Programming
  • Popular Now