glFlush and glutSwapBuffers cause lag... sometimes?

Started by
4 comments, last by L. Spiro 11 years, 6 months ago
Hello agian GameDev.net!

You guys have been the best technical source out there yet, (you know the problems AND other things that relate which are helpful to 3D game development) so I'd like to ask if you know what's up with these two functions. Every few frames, glFlush or glutSwapBuffers will have a big surge in the amount of time it takes (eg. 20x normal amount) and I guess I just don't understand these functions well enough to know why.

glFlush pushes everything through the graphics card, but that seems like it would just be a consistently high number.

glutSwapBuffers switches where you are drawing to and what the screen is reading from, to prevent things like tearing.

At least, that's my weak impression of the two events. Do you guys know what they're actually doing that might be causing this?

Thanks!
Advertisement

1) glFlush pushes everything through the graphics card, but that seems like it would just be a consistently high number.
2) glutSwapBuffers switches where you are drawing to and what the screen is reading from, to prevent things like tearing.


1) Not quite. glFlush ensures commands do not wait for more commands in some buffer (In case GPU is not on the same machine than the CPU generating the draw command for example. Not creating a new fullbown packet for every command to be sent over network is highly desirable). It is completely useless in an usual desctop / game context as there is no such worry (the driver, which is on the same CPU, is in perfect position to do it for you at the most appropriate time). Do not use glFlush if you are not talking with the GPU over network (you almost certainty are not).

2) Yes, it also ensures there are not too many frames worth of commands in command buffer - in case there are too many then it will stall till all commands from the oldest frame have finished.

edit: might want to check this also: http://www.opengl.org/wiki/Common_Mistakes#glFinish_and_glFlush
Read this: http://tomsdxfaq.blogspot.com/2006_04_01_archive.html#114482869432550076#114482869432550076

It's for D3D but the principle is the very same.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Wow, great answers! Well glFlush is gone now (yeah, no network rendering for me) and I'm trying to figure out which state changes I'm sending to my video card that are causing lag.

Right now, my game has no shaders, but uses a customSetter, glPushMatrix, glTransform, glScale, customDraw, and glPopMatrix a couple thousand times every frame. glScale always gets sent the same variables, so it seems like there should be a way to cut down on that. I already use a GLfloat array of verticies and GLubyte array for my indicies along with glDrawElements... eh, maybe posting the customDraw would be easier. This is for drawing an unmoving, unrotated, six sided cube in my scene: (and it may or may not have neighbors, in which case I wouldn't draw those faces)

void CubeShape::draw() {

// Did not want to have to make an array every draw,
// but otherwise all the arrays meld into one,
// and all turn the last color submitted -
// red for the goal. So here we are!
GLfloat newColors[] = { r1, g1, b1, // front top left
r1, g1, b1, // front top right
r2, g2, b2, // front bottom left
r2, g2, b2, // front bottom right
r1, g1, b1, // back top left
r1, g1, b1, // back top right
r2, g2, b2, // back bottom left
r2, g2, b2 // back bottom right
};
// These code blocks modified from work on songho.ca
glEnableClientState(GL_COLOR_ARRAY);
// activate and specify pointer to vertex array
glEnableClientState(GL_VERTEX_ARRAY);
glColorPointer(3, GL_FLOAT, 0, newColors);
glVertexPointer(3, GL_FLOAT, 0, vertices);
// Draw veritcies
for (int i=0; i<6; i++) {
if ((!useNeighbors) || ((!neighbors) // 0 left, 1 right, 2 top, 3 bot, 4 front, 5 rear
// Plot twist! When the castle gets 17fps on average, this line below bumps it up to 20!
&& (i!=3 || aboveCam) && (i!=2 || !aboveCam))) { // camera can't see top/bot of what's above/below it

// Top is special
if (i==2) {
// Disable array colors,
glDisableClientState(GL_COLOR_ARRAY);
// use a special color
glColor3f(r3,g3,b3);
}
// Trying new drawElements approach
// used to be DrawRangeElements, but Windows didn't like that. Using drawElements now... is this more inefficient?
//glDrawRangeElements(GL_TRIANGLES, 0, 3, 6, GL_UNSIGNED_BYTE, indices+6*i);
glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_BYTE, indices+6*i);
// Top has special cleanup too
if (i==2) {
// and re-enable array colors
glEnableClientState(GL_COLOR_ARRAY);
}
}
}
// deactivate vertex arrays after drawing
glDisableClientState(GL_VERTEX_ARRAY);
glDisableClientState(GL_COLOR_ARRAY);
}


So given that, and that it's a draw call from this:

void drawCube(int n, int player) {

// ALWAYS start by setting abovecam stat
cubeShape[n].setAboveCam(cubeY[n]-cameraPointer[player]->getMeanY()>0);

glPushMatrix();

// Position cube
glTranslated(cubeX[n], cubeY[n], cubeZ[n]);

// And make cube bigger
glScaled(100,100,100);

cubeShape[n].draw();

glPopMatrix();
}


...any idea what might be causing the most lag here?

... I'm trying to figure out which state changes I'm sending to my video card that are causing lag.

Erm, what? Let's try that again ...


SwapBuffers also ensures there are not too many frames worth of commands in command buffer - in case there are too many then it will stall till all commands from the oldest frame have finished.


... trying to find a OpenGL command that is causing SwapBuffers to take too much time is hence a bit silly. SwapBuffers stalls for too many frames, for which SwapBuffers itself is the frame-marker. In other words, too many SwapBuffers being in command buffer is what is causing SwapBuffers to stall.

What you should ask is: why are there so many SwapBuffers still in command buffer.There is only one logical reason for it:
* CPU is producing frames faster than GPU finishes them.

What can one do about it?
* Use more CPU cycles to do other useful stuff - AI / Sound / etc.
* Lessen GPU load (only useful if one is overtaxing the GPU).
* Disable vertical sync (only useful at very low frame rate provided that the end user has declared that he does not mind the tearing effect caused).

So, most likely situation for you:
* GPU is bored to death (unless one does something amazingly/disastrously wrong - i seriously doubt you have managed to overtax the GPU).
* You have vertical sync enabled (to avoid tearing. SwapBuffers is there to swap buffers - dealing with tearing, ie enforcing a specific time of the swap, is an optional extra) - hence not allowing GPU to proceed faster than your monitors refresh rate. PS: vertical sync is enabled by default (unless forced to disabled via GPU control panel).

What should you do about it?
* absolutely nothing if your framerate is good enough (~current monitor refresh rate).
* disable vertical sync, but only for development (to have reasonable estimate of workload).

--------------------------------
PS. I have to, just in case / prophylactically, note that using legacy OpenGL without an amazingly good justification is highly unadvised. Do you have one? One should avoid wasting time with the legacy garbage if at all possible.

Wow, great answers! Well glFlush is gone now (yeah, no network rendering for me) and I'm trying to figure out which state changes I'm sending to my video card that are causing lag.

You should also be aware that glutSwapBuffers() calls ::glFlush() internally anyway.
This just reinforces the point that calling ::glFlush() manually serves no purpose.



Right now, my game has no shaders, but uses a customSetter, glPushMatrix, glTransform, glScale, customDraw, and glPopMatrix a couple thousand times every frame.

This may very well be your problem.
Firstly, using shaders is, when done properly, much more efficient than not.
OpenGL vendors generally do not take the time needed to fully optimize every case with their drivers because there are just too many graphics cards out there.
The performance of the original OpenGL matrix stack is fully dependent on this and in general it should be assumed that it has a very low performance (much lower than what you could write on your own).




glScale always gets sent the same variables, so it seems like there should be a way to cut down on that.

This is called “redundant state checking” and is essential on all platforms. Whether you are using OpenGL or DirectX, this is necessary.
The following example illustrates what you need to do.
/**
* Sets the level of anisotropy on a given texture unit.
*
* \param _ui32Slot Slot in which to set the level of anisotropy.
* \param _ui32Level The amount of anisotropy.
*/
LSE_INLINE LSVOID LSE_FCALL CDirectX9::SetAnisotropy( LSUINT32 _ui32Slot, LSUINT32 _ui32Level ) {
_ui32Level = CStd::Clamp<LSUINT32>( _ui32Level, 1UL, CFndBase::m_mMetrics.ui32MaxAniso );
if ( _ui32Level != m_ui32AnisoLevel[_ui32Slot] ) {
m_ui32AnisoLevel[_ui32Slot] = _ui32Level;
m_pd3dDevice->SetSamplerState( _ui32Slot, D3DSAMP_MAXANISOTROPY, _ui32Level );
}
}
/**
* Sets the texture wrapping modes for a given slot.
*
* \param _ui32Slot Slot in which to set the texture wrapping modes.
* \param _taU The U wrapping mode.
* \param _taV The V wrapping mode.
*/
LSE_INLINE LSVOID LSE_FCALL CDirectX9::SetTextureWrapModes( LSUINT32 _ui32Slot, D3DTEXTUREADDRESS _taU, D3DTEXTUREADDRESS _taV ) {
if ( _taU != m_taWrapU[_ui32Slot] ) {
m_taWrapU[_ui32Slot] = _taU;
m_pd3dDevice->SetSamplerState( _ui32Slot, D3DSAMP_ADDRESSU, _taU );
}
if ( _taV != m_taWrapV[_ui32Slot] ) {
m_taWrapV[_ui32Slot] = _taV;
m_pd3dDevice->SetSamplerState( _ui32Slot, D3DSAMP_ADDRESSV, _taV );
}
}


Notice that every state has a local copy and only if the new state does not match the local copy is a call actually issued to DirectX 9 or OpenGL.
Note that this works just as well for OpenGL.
/**
* Sets depth writing to on or off.
*
* \param _bVal Whether depth writing is enabled or disabled.
*/
LSE_INLINE LSVOID LSE_FCALL COpenGl::SetDepthWrite( LSBOOL _bVal ) {
_bVal = _bVal != false;
if ( CFndBase::m_bDepthWrite != _bVal ) {
CFndBase::m_bDepthWrite = _bVal;
::glDepthMask( _bVal ? GL_TRUE : GL_FALSE );
}
}


Redundancy checking is one of the most valuable things you can do to enhance your performance. Never set the same state twice in any API, OpenGL, DirectX, or otherwise.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

This topic is closed to new replies.

Advertisement