Can anyone explain why this seems to be the source of intermittent slowdown?

Started by
9 comments, last by Sean_Seanston 9 years, 2 months ago

I noticed my game (really just a barebones program at this stage) was experiencing some unexplained slowdown and when I printed out the time between logic frames there was the occasional one every so often that was around 0.5 of a second longer than most.

At first I thought it was the way the game loop was set up, but I decided to investigate it properly when I changed the game loop to a different system and was still getting the same issue.

I isolated it to the rendering, then I found it was due to the draw function of an object of my BitmapFont class. I'm using OpenGL and my custom BitmapFont class which is essentially a specialized wrapper for my SpriteSheet class. As you'd expect, it loads an image to use as a spritesheet for a font, and based on the std::string input text it's given, it draws the appropriate letters to the screen.

All seems to be working, and I don't see anything that looks unreasonably costly or like a loop that would be at risk of hanging or anything. Either way, the stuttering only happens once in many many frames (even though the text/position may be completely static) so I can't see why it would take longer some times and not others.

HOWEVER, I have seemed to narrow the behaviour down to a very specific situation and I have no idea why this would be the case:

I'm currently displaying time and date values using the BitmapFont class. What I was doing was I had a single object called timeFont, which I'd initialize by loading the font image etc. then every rendering pass I'd call the following functions:


timeFont.setText( timeString );
timeFont.setPos( 125, 50 );
timeFont.drawFont();

- setText() simply changes the value of a string in the BitmapFont object (in this case timeFont) which is used to store the text, in this case the time.

- setPos() simply changes X and Y position values.

- drawFont() draws the font by wrapping the underlying SpriteSheet object's draw functions and calculating/using the appropriate font-specific values.

Then I would calculate a string to represent the date I wanted to draw, and call the following:


timeFont.setText( dateString );
timeFont.setPos( 0, 50 );
timeFont.drawFont();

Here I'm obviously intending to change the timeFont object's text value, change its position and then draw it again.

If I do this, I experience the occasional half second or so extra time between some updates like I described.

BUT, what is interesting to me is that the stuttering only seems to happen if I draw some text and then set the text to something else AND then draw it. Even if I reset the text it doesn't seem to happen unless I draw it again, and drawing the same text multiple times doesn't seem to cause it either.

So then I decided to create another BitmapFont object called dateFont, set it up exactly like timeFont, and then when it came to printing the date I'd simply call:


dateFont.setText( dateString );
dateFont.setPos( 0, 50 );
dateFont.drawFont();

The result is that the problem seems not to happen.

Why might this be? I find it hard to understand when the situation seems to be 100% repeatable in the ways I've described, and doesn't seem to be caused by extremely faulty or inefficient code (at least in the sense it works fine most of the time either way), AND simply doing the same thing but using 2 objects to do it appears to avoid it altogether.

In my ignorance, the only thing I can think of is maybe some sort of strange attempt at a compiler optimization? Like it isn't expecting the text to change and so it does something it shouldn't that slows it down sometimes? Even still, why only sometimes and not most of the time? I can't understand that explanation either.

I'll provide more specific code if anyone has an idea about what it could be and where to look. Probably no point spamming tons of code here yet when I have no idea what is relevant.

Advertisement
Have you tried profiling your code? Actual measurements are going to really help you narrow this down rather then just guessing what is and isn't slow.

Does your BitmapFont or SpriteSheet class write some instances/vertices/indices/something to a buffer and then hand that to a draw call? The second write might be waiting for the GPU to be finished with the buffer before letting you write to it again.The stall would happen when you try to lock/acquire/(whatever it's called in OpenGL) the buffer.

Edit: that still doesn't explain a half-second, though, so I also vote for the profiler.

Could be allocations. If you're running on Windows through VisualStudio, note that the debug heap will be enabled by default. Every X number of operations (1000 free's I think) will do an extra strenuous debug scan. Turning this off (and iterator debugging) and take a game from running at 20 FPS with massive spikes to a smooth 60 FPS even in debug builds. Note that the debug heap will be turned on even for release apps; it's entirely dependent on how you start the process (e.g. from VS without the appropriate env flag to disable it) and not how you built it.

Profiling would help you find out that's the problem (if you don't know how to use a profiler, stop everything you're doing and don't write another line of code until you've become an expert in profilers) but it wouldn't tell you how to fix it. Post questions with targeted problems and you can get more accurate responses rather than big lists of people guessing like we are right now.

Sean Middleditch – Game Systems Engineer – Join my team!

Have you tried profiling your code?

No. I suppose I should take the opportunity to learn how to do that now...

Does your BitmapFont or SpriteSheet class write some instances/vertices/indices/something to a buffer and then hand that to a draw call? The second write might be waiting for the GPU to be finished with the buffer before letting you write to it again.The stall would happen when you try to lock/acquire/(whatever it's called in OpenGL) the buffer.

Edit: that still doesn't explain a half-second, though, so I also vote for the profiler.

Interesting. Though as you say, a half second (might even be more) still seems like a lot.

Could be allocations. If you're running on Windows through VisualStudio, note that the debug heap will be enabled by default. Every X number of operations (1000 free's I think) will do an extra strenuous debug scan. Turning this off (and iterator debugging) and take a game from running at 20 FPS with massive spikes to a smooth 60 FPS even in debug builds. Note that the debug heap will be turned on even for release apps; it's entirely dependent on how you start the process (e.g. from VS without the appropriate env flag to disable it) and not how you built it.

Yeah, debug mode vs release or some sort of compiler activity did cross my mind so I tried it in release to no avail.

Still, would the fact that using 2 different objects eliminates the problem suggest that maybe it isn't that? Or would VS be doing something that might only slow it down when using the single object?

Profiling would help you find out that's the problem (if you don't know how to use a profiler, stop everything you're doing and don't write another line of code until you've become an expert in profilers) but it wouldn't tell you how to fix it.

Will do. What/where would be a good starting point for someone who isn't familiar with profiling? I'm also using the old Visual C++ 2008 Express Edition if that makes things more complicated. I've been meaning to switch to the newer version but I'm still using this old Vista machine for now...

http://www.gamedev.net/blog/355/entry-2260109-a-quick-introduction-to-sampler-based-profiling/

http://www.codersnotes.com/sleepy

Excellent, thanks for that.

I'll give all that a look and see what I come up with.

Ok, well I've run Very Sleepy on both versions of the code. one with a single object and one with 2 objects, and I'm definitely seeing a huge difference in the function that's used to draw the date and time.

I ran it for 20 seconds for both, and in the non-stuttering version GameDateTime::draw has 2.21s listed for Inclusive, but the stuttering one has a massive 8.11s. That's over 3 and a half times as long, almost 6 seconds over a sample time of only 20 seconds.

Delving a bit deeper...

BitmapFont::drawFont is taking 7.79s in the stuttery one, 7.36s of that is SpriteSheet::drawSprites...

Getting somewhere specific now, the uploadDataToGPU() function of my VBO wrapper class is taking 6.92s compared to 0.63s when using 2 objects. That's quite a difference.

Ok, just had a thought. Maybe this is the answer and I was being presumptive when I chose this, or maybe it's not, but here's the code for uploadDataToGPU():


void VBO::uploadDataToGPU( int usageHint )
{
	glBufferData( bufferType, data.size(), &data[0], usageHint );
	dataUploaded = true;
	data.clear();
}

Where data is an array of bytes holding the data to be uploaded, and usageHint is always set to GL_STATIC_DRAW for sprites.

COULD the problem be that I'm using GL_STATIC_DRAW, which IIRC suggests that the data won't be modified and optimizes for that, because I'm modifying it when the text changes to use different coordinates on the texture?

...

Well I just changed it to GL_DYNAMIC_DRAW for both the modelview matrices and texture coordinates, but the same function still had a time well over 6 seconds... and that about uses up all my ideas for solutions/causes.

Here's the whole code from the SpriteSheet drawing function:


void SpriteSheet::drawSprites()
{
	glEnable( GL_BLEND );
	glDisable( GL_DEPTH_TEST );
	glBlendFunc( GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA ); 

	//Set up VAO attributes
	glBindVertexArray( vao );

	//ModelView Matrix (Instanced, same for all vertices of each quad)
	vboMat.bind();
	vboMat.uploadDataToGPU( GL_STATIC_DRAW );
    for ( int i = 0; i < 4; i++ )
	{
        glEnableVertexAttribArray( 1 + i );
		glVertexAttribPointer( 1 + i, 4, GL_FLOAT, GL_FALSE, sizeof( glm::mat4 ),
                                (const GLvoid*)(sizeof(GLfloat) * i * 4));
        glVertexAttribDivisor( 1 + i, 1 );
    }

	//Texture coordinate offsets (Instanced, same for all vertices of each quad)
	vboTex.bind();
	vboTex.uploadDataToGPU( GL_STATIC_DRAW );

	glEnableVertexAttribArray( 5 );
	glVertexAttribPointer( 5, 1, GL_FLOAT, GL_FALSE, sizeof( GL_FLOAT ) * 2, 0 );
	glVertexAttribDivisor( 5, 1 );

	glEnableVertexAttribArray( 6 );
	glVertexAttribPointer( 6, 1, GL_FLOAT, GL_FALSE, sizeof( GL_FLOAT ) * 2, reinterpret_cast<void*>( sizeof( GL_FLOAT ) ) );
	glVertexAttribDivisor( 6, 1 );

	glBindBuffer( GL_ARRAY_BUFFER, 0 );

	spriteTexture.bindTexture();

	glDrawArraysInstanced( GL_TRIANGLES, 0, numVertices, numVertices/6 );

	numVertices = 0;

	glBindVertexArray( 0 );

	glDisable( GL_BLEND );
	glEnable( GL_DEPTH_TEST );
}

Should be reasonably self-explanatory I hope.

Are you uploading all data evertime you want to draw an object? Dont.

The point of a VBO is to upload once (preferably on load time) into one or many of them, and then only draw.

So, the slowdown is in the function that writes to the buffer, and you do have one VBO object per sprite drawing object, right? This sounds like what I had guessed earlier. Do you understand how the GPU works? When you call the glDraw function, drawing is not finished when the function returns. It gets queued up as a GPU job. All of the memory involved with that draw call cannot be overwritten until the GPU job finishes. You don't get stuck waiting until you try to reuse it.

I'm not really familiar with OpenGL. I've worked with this in Direct3D. In Direct3D, for performing a dynamic draw there is a map flag that promises not to overwrite the data in use, while you are still able to append. This allows you to allocate a buffer big enough to write everything you're going to do in one frame, discard at the beginning, and then remap with no-overwrite as you move forward through the buffer rendering your scene. I would expect OpenGL to have something like this as well. Here's a page I found that seems to be helpful:

https://www.opengl.org/wiki/Buffer_Object_Streaming

In the end, if you want to render two dynamic things at the same time, they need to use two separate buffers, two separate parts of the same buffer with the correct flags, or the second one needs to wait for the first one to finish.

This topic is closed to new replies.

Advertisement