Sign in to follow this  

OpenGL 0.5s lag after binding texture on Nvidia

Recommended Posts

KarimIO    271

Hey guys. I'm trying to get my application to work on my Nvidia GTX 970 desktop. It currently works on my Intel HD 3000 laptop, but on the desktop, every bind textures specifically from framebuffers, I get half a second of lag. This is done 4 times as I have three RGBA textures and one depth 32F buffer. I tried to use debugging software for the first time - RenderDoc only shows SwapBuffers() and no OGL calls, while Nvidia Nsight crashes upon execution, so neither are helpful. Without binding it runs regularly. This does not happen with non-framebuffer binds.

GLFramebuffer::GLFramebuffer(FramebufferCreateInfo createInfo) {
  glGenFramebuffers(1, &fbo);
	glBindFramebuffer(GL_FRAMEBUFFER, fbo);

	textures = new GLuint[createInfo.numColorTargets];
	glGenTextures(createInfo.numColorTargets, textures);
	GLenum *DrawBuffers = new GLenum[createInfo.numColorTargets];
	for (uint32_t i = 0; i < createInfo.numColorTargets; i++) {
		glBindTexture(GL_TEXTURE_2D, textures[i]);

		GLint internalFormat;
		GLenum format;
		TranslateFormats(createInfo.colorFormats[i], format, internalFormat); // returns GL_RGBA and GL_RGBA

		glTexImage2D(GL_TEXTURE_2D, 0, internalFormat, createInfo.width, createInfo.height, 0, format, GL_FLOAT, 0);


		DrawBuffers[i] = GL_COLOR_ATTACHMENT0 + i;
		glBindTexture(GL_TEXTURE_2D, 0);
		glFramebufferTexture(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0 + i, textures[i], 0);

	if (createInfo.depthFormat != FORMAT_DEPTH_NONE) {
		GLenum depthFormat;
		switch (createInfo.depthFormat) {
		case FORMAT_DEPTH_16:
			depthFormat = GL_DEPTH_COMPONENT16;
		case FORMAT_DEPTH_24:
			depthFormat = GL_DEPTH_COMPONENT24;
		case FORMAT_DEPTH_32:
			depthFormat = GL_DEPTH_COMPONENT32;
			depthFormat = GL_DEPTH24_STENCIL8;
			depthFormat = GL_DEPTH32F_STENCIL8;

		glGenTextures(1, &depthrenderbuffer);
		glBindTexture(GL_TEXTURE_2D, depthrenderbuffer);
		glTexImage2D(GL_TEXTURE_2D, 0, depthFormat, createInfo.width, createInfo.height, 0, GL_DEPTH_COMPONENT, GL_FLOAT, 0);

		glBindTexture(GL_TEXTURE_2D, 0);

		glFramebufferTexture(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, depthrenderbuffer, 0);

	if (createInfo.numColorTargets > 0)
		glDrawBuffers(createInfo.numColorTargets, DrawBuffers);

		std::cout << "Framebuffer Incomplete\n";

	glBindFramebuffer(GL_FRAMEBUFFER, 0);

	width = createInfo.width;
	height = createInfo.height;
	// ...
	// FBO Creation
	FramebufferCreateInfo gbufferCI;
	gbufferCI.colorFormats =;
	gbufferCI.depthFormat = FORMAT_DEPTH_32;
	gbufferCI.numColorTargets = gbufferCFs.size();
	gbufferCI.width = engine.settings.resolutionX;
	gbufferCI.height = engine.settings.resolutionY;
	gbufferCI.renderPass = nullptr;
	gbuffer = graphicsWrapper->CreateFramebuffer(gbufferCI);
    // Bind
	glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fbo);
	// Draw here...

	// Bind to textures
	glBindTexture(GL_TEXTURE_2D, textures[0]);
	glBindTexture(GL_TEXTURE_2D, textures[1]);
	glBindTexture(GL_TEXTURE_2D, textures[2]);
	glBindTexture(GL_TEXTURE_2D, depthrenderbuffer);

Here is an extract of my code. I can't think of anything else to include. I've really been butting my head into a wall trying to think of a reason but I can think of none and all my research yields nothing. Thanks in advance!

Share this post

Link to post
Share on other sites
KarimIO    271
8 minutes ago, Hodgman said:

Try profiling it on GPU View and seeing if there's anything in that data to explain what's going on.

Okay I'll try figure it out tomorrow, as I've never used it before and it looks super confusing.

Share this post

Link to post
Share on other sites
Hodgman    51339

I'm pretty new to GPU View too, so I'm not really sure what to look for :D:| 

I looked through the capture that you PM'ed me, hoping to maybe find that some NVidia thread was busy while your game was stalled, or the GPU was busy with some kind of DMA command or something... but all I can understand from this is that the GPU is idling a lot, and your game's main thread is extremely busy :( 

This is what a well-performing capture should look like though -- notice the HW queue and the game's device context are constantly full of queued up work.


Have you tried adding manual timing code to your game, to try and locate exactly which functions are blocking the CPU? You say that if you disable some code, the performance issue is gone... but try timing different bits of code to see if you can find where the time is going.

Share this post

Link to post
Share on other sites
KarimIO    271

Like I said the only issue comes from glbindtexture of a framebuffer texture. Regular textures work fine. Bitting works fine so I know it's not an issue of populating the framebuffer asynchronously. I used breakpoints to figure out the timing and what causes the issue.  Keep in mind vsync is on so that'd why there's not much work to be done. The scene is a simple crytek sponza with no lighting yet (it's enabled on my Intel but I disabled it for now) so there's not many commands.  I haven't multithreaded anything yet so it shouldn't matter if it's idle. This happens no matter how many times I restart so not an issue of Nvidia working on something else. The rest of the frame takes 12ms due to vsync. Also I had all this and so much more running before I improved the rendering architecture (I wrote my own parser and exporter for faster loading, redesigned the rendering wrappers to make vulkan work better, and I made everything draw based on shader and material first rather than object) 


Edit: Oh and thank you so much for helping me so far! 

Edited by KarimIO

Share this post

Link to post
Share on other sites
KarimIO    271
9 hours ago, TheChubu said:

Are you checking for glErrors? Or better, using arb_debug_output or khr_debug_output?

Yeah, the debug output. I only get info and one low warning which is just giving me buffer sizes I think. The latter shows up every frame. 

8 hours ago, Hodgman said:

Which GL function calls contain these massive stalls?

Like I said, glbindtexture but only when used with a framebuffer texture. I've checked the creation of it a hundred times over and don't think there's any problems. 

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Similar Content

    • By pseudomarvin
      I assumed that if a shader is computationally expensive then the execution is just slower. But running the following GLSL FS instead just crashes
      void main() { float x = 0; float y = 0; int sum = 0; for (float x = 0; x < 10; x += 0.00005) { for (float y = 0; y < 10; y += 0.00005) { sum++; } } fragColor = vec4(1, 1, 1 , 1.0); } with unhandled exception in nvoglv32.dll. Are there any hard limits on the number of steps/time that a shader can take before it is shut down? I was thinking about implementing some time intensive computation in shaders where it would take on the order of seconds to compute a frame, is that possible? Thanks.
    • By Arulbabu Donbosco
      There are studios selling applications which is just copying any 3Dgraphic content and regenerating into another new window. especially for CAVE Virtual reality experience. so that the user opens REvite or CAD or any other 3D applications and opens a model. then when the user selects the rendered window the VR application copies the 3D model information from the OpenGL window. 
      I got the clue that the VR application replaces the windows opengl32.dll file. how this is possible ... how can we copy the 3d content from the current OpenGL window.
      anyone, please help me .. how to go further... to create an application like VR CAVE. 
    • By cebugdev
      hi all,

      i am trying to build an OpenGL 2D GUI system, (yeah yeah, i know i should not be re inventing the wheel, but this is for educational and some other purpose only),
      i have built GUI system before using 2D systems such as that of HTML/JS canvas, but in 2D system, i can directly match a mouse coordinates to the actual graphic coordinates with additional computation for screen size/ratio/scale ofcourse.
      now i want to port it to OpenGL, i know that to render a 2D object in OpenGL we specify coordiantes in Clip space or use the orthographic projection, now heres what i need help about.
      1. what is the right way of rendering the GUI? is it thru drawing in clip space or switching to ortho projection?
      2. from screen coordinates (top left is 0,0 nd bottom right is width height), how can i map the mouse coordinates to OpenGL 2D so that mouse events such as button click works? In consideration ofcourse to the current screen/size dimension.
      3. when let say if the screen size/dimension is different, how to handle this? in my previous javascript 2D engine using canvas, i just have my working coordinates and then just perform the bitblk or copying my working canvas to screen canvas and scale the mouse coordinates from there, in OpenGL how to work on a multiple screen sizes (more like an OpenGL ES question).
      lastly, if you guys know any books, resources, links or tutorials that handle or discuss this, i found one with marekknows opengl game engine website but its not free,
      Just let me know. Did not have any luck finding resource in google for writing our own OpenGL GUI framework.
      IF there are no any available online, just let me know, what things do i need to look into for OpenGL and i will study them one by one to make it work.
      thank you, and looking forward to positive replies.
    • By fllwr0491
      I have a few beginner questions about tesselation that I really have no clue.
      The opengl wiki doesn't seem to talk anything about the details.
      What is the relationship between TCS layout out and TES layout in?
      How does the tesselator know how control points are organized?
          e.g. If TES input requests triangles, but TCS can output N vertices.
             What happens in this case?
      In this article,
      the isoline example TCS out=4, but TES in=isoline.
      And gl_TessCoord is only a single one.
      So which ones are the control points?
      How are tesselator building primitives?
    • By Orella
      I've been developing a 2D Engine using SFML + ImGui.
      Here you can see an image
      The editor is rendered using ImGui and the scene window is a sf::RenderTexture where I draw the GameObjects and then is converted to ImGui::Image to render it in the editor.
      Now I need to create a 3D Engine during this year in my Bachelor Degree but using SDL2 + ImGui and I want to recreate what I did with the 2D Engine. 
      I've managed to render the editor like I did in the 2D Engine using this example that comes with ImGui. 
      3D Editor preview
      But I don't know how to create an equivalent of sf::RenderTexture in SDL2, so I can draw the 3D scene there and convert it to ImGui::Image to show it in the editor.
      If you can provide code will be better. And if you want me to provide any specific code tell me.
  • Popular Now