300 000 fps

Started by
6 comments, last by Dark Helmet 10 years, 3 months ago

i never run newer version of opengl, back then i was used only OGl 1 and i never got fps higher than about 900 for a some cube test with that, but yesterday i did it with my new OGL/freeglut framevork and it seems that i got 300 000 fps - at least the succesive timer cals in the called in display method raports that delta time 3 microsecond on each display - is this really flushing 300 thousands of screens per second or I mestaken something (something asynchronous is called or something) and should measure it in a differend way?

Advertisement
and should measure it in a differend way?

This in the first place! FPS is a reciprocal measure and as such not useful if the range becomes bigger than some ten FPS, perhaps up to e.g. 100 FPS or so. The 900 is already a less meaningful number. Its better to use a linear measure: Compute the mean time per frame over a couple of frames as an absolute measure, and perhaps a percentage between such values for a comparative one.

Also: A cube is most probably not a meaningful test at all. In a real world you may encounter several limits: DMA transfer, texture sampling, pixel fill rate, shader complexity, …; none of them is in peril with a cube example (assuming you don't mean a Koch cube ;)).

Regarding the question of performance boost itself: OpenGL 1 is very, VERY old. None of the (more or less) modern techniques was supported. If you use a modern OpenGL it is much more adapted to existing graphics cards. So yes, it is principally possible, of course.

Try measuring the FPS across multiple frames, it might be that the timer resolution is not enough for the tiny duration of a single frame and is for some reason giving results smaller than reality.

o3o

Try measuring the FPS across multiple frames, it might be that the timer resolution is not enough for the tiny duration of a single frame and is for some reason giving results smaller than reality.

no the timer is good, i think that it gives good results it is when i have

IdleLoop()

{

double timeDelta = TakeTime();

displayScene();

}

the time delta (3 microseconds) is given properly but I am maybe not sure if the DisplayScene here is doing all the work of making whole

new pixelbuffer and show it or maybe not?

the display code itself is

 
void draw()
{
    glEnableClientState(GL_NORMAL_ARRAY);
    glEnableClientState(GL_COLOR_ARRAY);
    glEnableClientState(GL_VERTEX_ARRAY);
    glNormalPointer(GL_FLOAT, 0, normals2);
    glColorPointer(3, GL_FLOAT, 0, colors2);
    glVertexPointer(3, GL_FLOAT, 0, vertices2);
 
    glPushMatrix();
 
    glDrawElements(GL_TRIANGLES, 36, GL_UNSIGNED_BYTE, indices);
 
    glPopMatrix();
 
    glDisableClientState(GL_VERTEX_ARRAY);  
    glDisableClientState(GL_COLOR_ARRAY);
    glDisableClientState(GL_NORMAL_ARRAY);
}
 
 
 
////////////////////////////////////////
 
void display()
{
    frame++;
    glClear(GL_COLOR_BUFFER_BIT);
    glPushMatrix();
    glRotatef(frame/1000 , 1, 0, 0);   
    draw();
    glPopMatrix();
    glFlush();
}
 

can it be so fast? does it really whole frame generation (as a result

i see rotating rectangle (some tears on the sufface too) but ofc i cannot be sure if this is 300 tys frames per second or maybe just 300 or so

You are measuring the speed at which you can submit render commands, not the speed at which your scene is drawn and displayed. Basically what you measure is the memcpy that OpenGL does on your vertex array (to a vertex buffer that you don't know about) when you call glDrawElements, plus the overhead of a dozen library calls. It's not very surprising that this is fast.

You are not swapping buffers, so there is really no notion of a "frame" at all. You do call glFlush, but that isn't the same thing (for the most part, glFlush is pretty useless).

You are measuring the speed at which you can submit render commands, not the speed at which your scene is drawn and displayed. Basically what you measure is the memcpy that OpenGL does on your vertex array (to a vertex buffer that you don't know about) when you call glDrawElements, plus the overhead of a dozen library calls. It's not very surprising that this is fast.

You are not swapping buffers, so there is really no notion of a "frame" at all. You do call glFlush, but that isn't the same thing (for the most part, glFlush is pretty useless).

atlight tnx, (i suspected thah things releted to asynchronicity) so how to measure real frame making speed - i am using ogl + freeglut, but not got much experience with this yet

I used glFinish and got only 9000fps :C

with glutSwapBuffers() i got 75 fps-es (oscillating 74.7 - 75.3)

this is my screen refresh rate setting

I use lcd right now - has this monitor the refresh rate like crt ones-

it is better to set 60 or 75?

glFinish is much closer to what one would want to use since it blocks until all command execution has finished (glFlush doesn't wait for anything). It comes with a serious performance impact, however, since it causes a pipeline stall.

glutSwapBuffers, on the other hand, is the real, true thing. It actually swaps buffers, so there is really a notion of "frame". It also blocks, but synchronized to the actual hardware update frequency, and in a somewhat less rigid way (usually drivers will let you pre-render 2 or 3 frames or will only block at the next draw command after swap, or something else).

The reason why you only see 75 fps is that you have vertical sync enabled (in your driver settings). If you can "comfortably" get those 75 fps at all times (i.e. your frame time (worst, not average) is below 13.3 ms), it doesn't really matter how much faster you can render since that's all the monitor will display anyway. Rendering more frames than those displayed is only a waste of energy (and wearing down components due to heat development).

Now of course, if you only ever get at most 75 (or 60 on other monitors) frames per second displayed, it seems a bit hard to measure the actual frame time accurately. You might have a frame time of 13.3 ms or 10ms or 8ms and it would be no difference since it all comes out as 75fps because the driver syncs to that after finishing your drawing commands

glQueryCounter can be of help here. It lets you get accurate timing without having to stall as when using glFinish. So you can measure the actual time it takes to draw, regardless of how long the driver blocks thereafter to sync.

(Another less elegant but nevertheless effective solution would be to disable vertical sync during development.)

glFinish is much closer to what one would want to use since it blocks until all command execution has finished (glFlush doesn't wait for anything). It comes with a serious performance impact, however, since it causes a pipeline stall.

glutSwapBuffers, on the other hand, is the real, true thing. It actually swaps buffers, so there is really a notion of "frame". It also blocks, but synchronized to the actual hardware update frequency, and in a somewhat less rigid way (usually drivers will let you pre-render 2 or 3 frames or will only block at the next draw command after swap, or something else).

The reason why you only see 75 fps is that you have vertical sync enabled (in your driver settings). If you can "comfortably" get those 75 fps at all times (i.e. your frame time (worst, not average) is below 13.3 ms), it doesn't really matter how much faster you can render since that's all the monitor will display anyway. Rendering more frames than those displayed is only a waste of energy (and wearing down components due to heat development).

Now of course, if you only ever get at most 75 (or 60 on other monitors) frames per second displayed, it seems a bit hard to measure the actual frame time accurately. You might have a frame time of 13.3 ms or 10ms or 8ms and it would be no difference since it all comes out as 75fps because the driver syncs to that after finishing your drawing commands

glQueryCounter can be of help here. It lets you get accurate timing without having to stall as when using glFinish. So you can measure the actual time it takes to draw, regardless of how long the driver blocks thereafter to sync.

(Another less elegant but nevertheless effective solution would be to disable vertical sync during development.)

alright tnx for explanation, i disabled vsync in nvconsole (do not know why perprogram not working but global disable working) and got about 9000 fps with swap buffer close to the same as with glFinish

yet some doubt if glFlush is not drawing all the calls what it is doing with such calls? skips or queues?

glFlush is supposed to mean "start processing all pending GL commands now and return immediately". It doesn't wait for the commands to finish processing, it just signals to the driver that it can start processing them. There are actually a lot of implicit glFlush cases in normal code, with the most obvious one being when the command buffer is full - the driver must start emptying the buffer before new commands can go in.

I see that Carmack has noted on his Twitter that with some drivers glFlush is a nop. If this is the case, then calling glFlush at the end of a frame (or wherever in the frame) will have no effect and the actual flush won't occur until the command buffer fills. Depending on how much work you do in a frame, and on how big the command buffer is (that's driver-dependent so don't ask) it means that you may get 10, 20, or even hundreds of frames worth of commands in there before anything actually happens.

It's easy to see how this kind of behaviour can seriously mislead you into thinking that you're running crazy-fast. A large part of the blame here must seriously go to old GLUT tutorials that always create a single-buffered context. That's just so unrepresentative of how things work in real programs.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

glutSwapBuffers, on the other hand, is the real, true thing. It actually swaps buffers, so there is really a notion of "frame". It also blocks, but synchronized to the actual hardware update frequency, and in a somewhat less rigid way (usually drivers will let you pre-render 2 or 3 frames or will only block at the next draw command after swap, or something else).



When timing with just SwapBuffers though, be careful. The problem ends up being the driver typically queues up the request quickly on the CPU and returns immediately (i.e. CPU does not block), after which it lets you start queuing up render commands for future frames. At some random point in the middle of queuing one of those frames when the FIFO fills, "then" the CPU blocks, waiting on some VSYNC event in the middle of a frame. This causes really odd timing spikes leaving you puzzled as to what's going on.

If you want reasonable full-frame timings, after SwapBuffers(), put a glFinish(), and then stop your timer.

This topic is closed to new replies.

Advertisement