Major OpenGL happiness =)

Started by
11 comments, last by Frederick 15 years, 10 months ago
Yes, it´s true it´s making me unhappy - and plenty of =))) Joking aside, I have horrible performance problems and don´t know what I am doing wrong. I spent some time to set everything up with shaders, vertex buffers etc. and have focused more on framework development rather than pumping triangles to the gfx cards, trusting everything would be fine. Nevertheless I was curious and have done some stress test to find out how many batches I can actually push per frame, inspired by threads that were floating round here in the forum in the past days *wink* Unfortunately NOT many ! What I do is I put a simple texture mapped (shader) cube into a vertex buffer and create an index buffer and repeat the vertices of the cube all over for 400 times. So a cube with 12 verts is rendered with 400x overdraw. The reason for that is I haven´t got a mesh loader yet :-) Then I will go on and render these buffers with 10x calls to glDrawRangeElements, so we get 4000x cube overdraw - a cake isn´t it !? Well no... performance is as low as 10fps I have got an AMD 2GHz / Geforce 6800LE 128MB. 400 x 12 x 10 = 48.000 triangles <- that´s ridiculous, I would bet glBegin/glEnd could do THAT, damn ! Life is soo hard sometimes. I had fears that it could be java, that is soo slow (although i know it is definetly not slow), but a quick profiling convinced me that the application is hanging around over 80% of its time in glFlush(). Is this a typical performance pattern for OpenGL ? I can hardly imagine. At this point I would really appreciate some help... I have tried to apply everything I read here on gamedev and else about vbo usage etc. But unfortunately performance is that bad. Here comes a glIntercept log, so you can check if I do something really stupid =)

===============================================================================
 GLIntercept version 0.5 Log generated on: Mon Jun 16 22:31:17 2008
 
===============================================================================

wglChoosePixelFormat(0x95010de0,0x90fb20)
----->wglDescribePixelFormat(0x95010de0,1,0,0x0000)=102 =7 
wglSetPixelFormat(0x95010de0,7,0x90fb20)
----->wglDescribePixelFormat(0x95010de0,7,40,0x90f9c0)=102 =true 
wglCreateContext(0x95010de0)
----->wglGetPixelFormat(0x95010de0)=7 
----->wglDescribePixelFormat(0x95010de0,7,40,0x90f9c8)=102 
----->wglGetPixelFormat(0x95010de0)=7 
----->wglDescribePixelFormat(0x95010de0,1,0,0x0000)=102 =0x10000 
wglMakeCurrent(0x95010de0,0x10000)
----->wglGetPixelFormat(0x95010de0)=7 
----->wglGetPixelFormat(0x95010de0)=7 
----->wglDescribePixelFormat(0x95010de0,1,0,0x0000)=102 =true 
wglGetProcAddress("glCreateShader")=0x100299f0 
wglGetProcAddress("glDeleteShader")=0x10029a50 
wglGetProcAddress("glCreateProgram")=0x10029ab0 
wglGetProcAddress("glDeleteProgram")=0x10029b10 
wglGetProcAddress("glAttachShader")=0x10029b70 
wglGetProcAddress("glDetachShader")=0x10029bd0 
wglGetProcAddress("glShaderSource")=0x10029c30 
wglGetProcAddress("glCompileShader")=0x10029c90 
wglGetProcAddress("glLinkProgram")=0x10029cf0 
wglGetProcAddress("glUseProgram")=0x10029d50 
wglGetProcAddress("glGetUniformLocation")=0x10029db0 
wglGetProcAddress("glUniform1i")=0x10029e10 
wglGetProcAddress("glUniform1f")=0x10029e70 
wglGetProcAddress("glUniform2f")=0x10029ed0 
wglGetProcAddress("glUniform3f")=0x10029f30 
wglGetProcAddress("glUniform4f")=0x10029f90 
wglGetProcAddress("glGetAttribLocation")=0x10029ff0 
wglGetProcAddress("glUniformMatrix4fv")=0x1002a050 
wglGetProcAddress("glVertexAttrib3f")=0x1002a0b0 
wglGetProcAddress("glVertexAttrib4f")=0x1002a110 
wglGetProcAddress("glActiveTexture")=0x1002a170 
wglGetProcAddress("glVertexAttribPointer")=0x1002a1d0 
wglGetProcAddress("glEnableVertexAttribArray")=0x1002a230 
wglGetProcAddress("glGenBuffers")=0x1002a290 
wglGetProcAddress("glDeleteBuffers")=0x1002a2f0 
wglGetProcAddress("glBindBuffer")=0x1002a350 
wglGetProcAddress("glBufferData")=0x1002a3b0 
wglGetProcAddress("glDrawRangeElements")=0x1002a410 
glEnable(GL_DEPTH_TEST)
glEnable(GL_TEXTURE_2D)
glTexParameterf(GL_TEXTURE_2D,GL_TEXTURE_WRAP_S,10497.000000)
glTexParameterf(GL_TEXTURE_2D,GL_TEXTURE_WRAP_T,10497.000000)
glTexEnvf(GL_TEXTURE_ENV,GL_TEXTURE_ENV_MODE,8448.000000)
glDepthFunc(GL_LESS)
glDisable(GL_CULL_FACE)
glEnableClientState(GL_INDEX_ARRAY)
glGetIntegerv(GL_MAX_ELEMENTS_VERTICES,0x2ef6090)
glGetString(GL_VENDOR)="NVIDIA Corporation" 
glViewport(0,0,640,480)
glClearColor(0.000000,0.000000,0.500000,0.000000)
glCreateShader(GL_VERTEX_SHADER)=1 
glGetError()=GL_NO_ERROR 
glCreateShader(GL_FRAGMENT_SHADER)=2 
glGetError()=GL_NO_ERROR 
glShaderSource(1,1,0x90fb3c,0x0000)
glGetError()=GL_NO_ERROR 
glShaderSource(2,1,0x90fb40,0x0000)
glGetError()=GL_NO_ERROR 
glCompileShader(1)
glGetError()=GL_NO_ERROR 
glCompileShader(2)
glGetError()=GL_NO_ERROR 
glCreateProgram()=3 
glGetError()=GL_NO_ERROR 
glAttachShader(3,1)
glAttachShader(3,2)
glLinkProgram(3)
glGetError()=GL_NO_ERROR 
glGetUniformLocation(3,"modelviewProjection")=0 
glGetAttribLocation(3,"color")=1 
glCreateShader(GL_VERTEX_SHADER)=4 
glGetError()=GL_NO_ERROR 
glCreateShader(GL_FRAGMENT_SHADER)=5 
glGetError()=GL_NO_ERROR 
glShaderSource(4,1,0x90fb3c,0x0000)
glGetError()=GL_NO_ERROR 
glShaderSource(5,1,0x90fb40,0x0000)
glGetError()=GL_NO_ERROR 
glCompileShader(4)
glGetError()=GL_NO_ERROR 
glCompileShader(5)
glGetError()=GL_NO_ERROR 
glCreateProgram()=6 
glGetError()=GL_NO_ERROR 
glAttachShader(6,4)
glAttachShader(6,5)
glLinkProgram(6)
glGetError()=GL_NO_ERROR 
glGetUniformLocation(6,"modelviewProjection")=0 
glGetAttribLocation(6,"texcoord")=1 
glActiveTexture(GL_TEXTURE0)
glGenTextures(1,0x90fb58)
glBindTexture(GL_TEXTURE_2D,1)
glTexParameterf(GL_TEXTURE_2D,GL_TEXTURE_MAG_FILTER,9729.000000)
glTexParameterf(GL_TEXTURE_2D,GL_TEXTURE_MIN_FILTER,9729.000000)
glTexImage2D(GL_TEXTURE_2D,0,GL_RGBA,128,128,0,GL_RGBA,GL_UNSIGNED_BYTE,0x22a237c4)
glGenBuffers(1,0x90fbf8)
glBindBuffer(GL_ARRAY_BUFFER,1)
glBufferData(GL_ARRAY_BUFFER,160,0x2a74000,GL_STATIC_DRAW)
glGenBuffers(1,0x90fbf8)
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER,2)
glBufferData(GL_ELEMENT_ARRAY_BUFFER,57600,0x2aca000,GL_STATIC_DRAW)
glGenBuffers(1,0x90fbf8)
glBindBuffer(GL_ARRAY_BUFFER,3)
glBufferData(GL_ARRAY_BUFFER,192,0x2ac9000,GL_STATIC_DRAW)
glBindBuffer(GL_ARRAY_BUFFER,1)
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER,2)
glEnableClientState(GL_VERTEX_ARRAY)
glVertexPointer(3,GL_FLOAT,20,0x0000)
glEnableVertexAttribArray(1)
glVertexAttribPointer(1,2,GL_FLOAT,false,20,0x000c)
glUseProgram(6)
glActiveTexture(GL_TEXTURE0)
glBindTexture(GL_TEXTURE_2D,1)
glUniformMatrix4fv(0,1,false,[1.000000,0.000000,0.000000,0.000000,0.000000,1.000000,
0.000000,0.000000,0.000000,0.000000,1.000000,1.000000,0.000000,0.000000,0.000000,1.000000])
glDrawElements(GL_TRIANGLES,14400,GL_UNSIGNED_SHORT,0x0000) GLSL=6  Textures[ (0,1) ] 
glDrawElements(GL_TRIANGLES,14400,GL_UNSIGNED_SHORT,0x0000) GLSL=6  Textures[ (0,1) ] 
glDrawElements(GL_TRIANGLES,14400,GL_UNSIGNED_SHORT,0x0000) GLSL=6  Textures[ (0,1) ] 
glDrawElements(GL_TRIANGLES,14400,GL_UNSIGNED_SHORT,0x0000) GLSL=6  Textures[ (0,1) ] 
glDrawElements(GL_TRIANGLES,14400,GL_UNSIGNED_SHORT,0x0000) GLSL=6  Textures[ (0,1) ] 
glDrawElements(GL_TRIANGLES,14400,GL_UNSIGNED_SHORT,0x0000) GLSL=6  Textures[ (0,1) ] 
glDrawElements(GL_TRIANGLES,14400,GL_UNSIGNED_SHORT,0x0000) GLSL=6  Textures[ (0,1) ] 
glDrawElements(GL_TRIANGLES,14400,GL_UNSIGNED_SHORT,0x0000) GLSL=6  Textures[ (0,1) ] 
glDrawElements(GL_TRIANGLES,14400,GL_UNSIGNED_SHORT,0x0000) GLSL=6  Textures[ (0,1) ] 
glDrawElements(GL_TRIANGLES,14400,GL_UNSIGNED_SHORT,0x0000) GLSL=6  Textures[ (0,1) ] 
glFlush()
wglSwapBuffers(0x95010de0)=true 
glClear(GL_DEPTH_BUFFER_BIT | GL_COLOR_BUFFER_BIT)
glUniformMatrix4fv(0,1,false,[0.998401,0.001599,0.056508,0.056508,0.001599,0.998401,
-0.056508,-0.056508,-0.056508,0.056508,0.996802,0.996802,0.000000,0.000000,2.000000,3.000000])
glDrawElements(GL_TRIANGLES,14400,GL_UNSIGNED_SHORT,0x0000) GLSL=6  Textures[ (0,1) ] 
glDrawElements(GL_TRIANGLES,14400,GL_UNSIGNED_SHORT,0x0000) GLSL=6  Textures[ (0,1) ] 
glDrawElements(GL_TRIANGLES,14400,GL_UNSIGNED_SHORT,0x0000) GLSL=6  Textures[ (0,1) ] 
glDrawElements(GL_TRIANGLES,14400,GL_UNSIGNED_SHORT,0x0000) GLSL=6  Textures[ (0,1) ] 
glDrawElements(GL_TRIANGLES,14400,GL_UNSIGNED_SHORT,0x0000) GLSL=6  Textures[ (0,1) ] 
glDrawElements(GL_TRIANGLES,14400,GL_UNSIGNED_SHORT,0x0000) GLSL=6  Textures[ (0,1) ] 
glDrawElements(GL_TRIANGLES,14400,GL_UNSIGNED_SHORT,0x0000) GLSL=6  Textures[ (0,1) ] 
glDrawElements(GL_TRIANGLES,14400,GL_UNSIGNED_SHORT,0x0000) GLSL=6  Textures[ (0,1) ] 
glDrawElements(GL_TRIANGLES,14400,GL_UNSIGNED_SHORT,0x0000) GLSL=6  Textures[ (0,1) ] 
glDrawElements(GL_TRIANGLES,14400,GL_UNSIGNED_SHORT,0x0000) GLSL=6  Textures[ (0,1) ] 
glFlush()
...
I even come to think that it may be in software... although the vendor string is clearly NVIDIA, so that should be fine. Other GL applications, that means not written by me *lol*, run fine. One even states that it performes 10000 draw calls per frame and does well... so there I am with my 10 calls... I would really appreciate help, because this damn performance thing is sucking my motivation for my gaming project. So you won´t let that happen, wouldn´t you ;-) Frederick [Edited by - Frederick on June 19, 2008 2:44:36 AM]
Advertisement
You shouldn't be doing glFlush before a swap (or pretty much ever, really). The driver is happiest when it can buffer a few frames forward. I dunno if that would cause that much of a slowdown, though.

Also, how big are these cubes? Are they all in the center of the screen? If each one takes up 400x400=160000 pixels, with 4000x overdraw, that's 640 megapixels per frame, and 10 fps sounds downright speedy for that.
Yeah sounds like you're fill rate limited. Try reducing the size of the cube so that its the size of a postage stamp on the screen, and repeat the test and see if performance improves.

-=[Megahertz]=-
glEnableClientState(GL_INDEX_ARRAY)

probably a mistake there. This is for color indexing.
Sig: http://glhlib.sourceforge.net
an open source GLU replacement library. Much more modern than GLU.
float matrix[16], inverse_matrix[16];
glhLoadIdentityf2(matrix);
glhTranslatef2(matrix, 0.0, 0.0, 5.0);
glhRotateAboutXf2(matrix, angleInRadians);
glhScalef2(matrix, 1.0, 1.0, -1.0);
glhQuickInvertMatrixf2(matrix, inverse_matrix);
glUniformMatrix4fv(uniformLocation1, 1, FALSE, matrix);
glUniformMatrix4fv(uniformLocation2, 1, FALSE, inverse_matrix);
You're making a fundamental mistake here: you're stressing fillrate, and you complain about geometry performance.

By repeating the same cube, the pixels on the cube will be drawn over and over. In Snetfel's scenario, it's the equivalent of drawing.. 813 full screens @ 1024x768... *every frame*. Hardly slow.

Y.
Good morning everybody, at least on this side of the planet =)
And thanks for the answers ! I apologize, I was out yesterday night and could not answer...

Quote:
Also, how big are these cubes? Are they all in the center of the screen? If each one takes up 400x400=160000 pixels, with 4000x overdraw, that's 640 megapixels per frame, and 10 fps sounds downright speedy for that.


Yesss, exactly - you guessed even the size very well =)) Actually its is a little bit smaller, but the backside gets drawn too. Very good observation, thank you very much. When I was setting this up, I wanted to test how many game objects I would be able to draw, thought that 4800 triangles is kinda okay for a game model.
What I forgot about is, that actual game polygons will (or should) never be so large, yes I believe I fill the screen several times =)
Is it right to say that modern hardware may draw millions of triangles, but when using that amount, the triangles should be more or less pixel-sized ?
So I have actually the decision between: 1) detail achieved by lots and lots of vertices and 2)detail achieved by lesser vertices but heavier shaders ?
Ok enough questions...

Quote:
You shouldn't be doing glFlush before a swap (or pretty much ever, really). The driver is happiest when it can buffer a few frames forward. I dunno if that would cause that much of a slowdown, though.


Unfortunately that does not work, i removed the glFlush and nothing is drawn at all. So should I triggerd it every third frame or so, or is there another option ?


Quote:
glEnableClientState(GL_INDEX_ARRAY)

probably a mistake there. This is for color indexing.


Thanks a lot for checking the gl-calls V-Man =) Yes that is indeed a mistake... wonder why the program doesn´t crash.


Quote:
Yeah sounds like you're fill rate limited. Try reducing the size of the cube so that its the size of a postage stamp on the screen, and repeat the test and see if performance improves.


I tried that and actually it went faster. I am relieved - so it is not my bad coding and a gfx card is not that wonder-cake that will just eat everything one throws at it, i just believed so =)
So I will continue my work and hopefully everything will be alright.


Quote:
By repeating the same cube, the pixels on the cube will be drawn over and over. In Snetfel's scenario, it's the equivalent of drawing.. 813 full screens @ 1024x768... *every frame*. Hardly slow.


*looool*

813 screens =))))
Actually I did not realize that, believe me the result is an unimpressive rotating cube, just no sign that pixels to fill 813 screens are being pushed ;-)))

Cool, so I learned another thing, hope you don´t bother - everybody needs to make his / her experiences and filling 813 screens is quite funny isn´t it.

So one question remains: what to do with the glFlush() ? Maybe I need to set a parameter, that enables the driver to decide sovereignly (hope thats an english word :-P) when to flush its queue ?

Cheers and lots of thanks to you,
Frederick
There shouldn't be any reason to call glFlush unless you have a very specific reason to. Just delete it.
IIRC SwapBuffers() calls it implicitly? Can someone verify this?
Quote:Original post by MARS_999
There shouldn't be any reason to call glFlush unless you have a very specific reason to. Just delete it.
IIRC SwapBuffers() calls it implicitly? Can someone verify this?


SwapBuffers doesn't call glFlush. SwapBuffers sends a command to the driver that it is time swap.
Obviously you can't just swap the buffer all of a sudden. All commands just sitting in the commands buffer must be flushed, in other words sent to the driver and at the end, a swap command is inserted.
Sig: http://glhlib.sourceforge.net
an open source GLU replacement library. Much more modern than GLU.
float matrix[16], inverse_matrix[16];
glhLoadIdentityf2(matrix);
glhTranslatef2(matrix, 0.0, 0.0, 5.0);
glhRotateAboutXf2(matrix, angleInRadians);
glhScalef2(matrix, 1.0, 1.0, -1.0);
glhQuickInvertMatrixf2(matrix, inverse_matrix);
glUniformMatrix4fv(uniformLocation1, 1, FALSE, matrix);
glUniformMatrix4fv(uniformLocation2, 1, FALSE, inverse_matrix);
Quote:Unfortunately that does not work, i removed the glFlush and nothing is drawn at all. So should I triggerd it every third frame or so, or is there another option ?


You need to use glFlush if you are doing single buffered rendering.
With double buffer, just use SwapBuffers.
I imagine there are a lot of bugs in your program and you need to spend a few days analyzing before you talk about performance issues.
Sig: http://glhlib.sourceforge.net
an open source GLU replacement library. Much more modern than GLU.
float matrix[16], inverse_matrix[16];
glhLoadIdentityf2(matrix);
glhTranslatef2(matrix, 0.0, 0.0, 5.0);
glhRotateAboutXf2(matrix, angleInRadians);
glhScalef2(matrix, 1.0, 1.0, -1.0);
glhQuickInvertMatrixf2(matrix, inverse_matrix);
glUniformMatrix4fv(uniformLocation1, 1, FALSE, matrix);
glUniformMatrix4fv(uniformLocation2, 1, FALSE, inverse_matrix);
Quote:
You need to use glFlush if you are doing single buffered rendering.
With double buffer, just use SwapBuffers.


I am sorry that was my mistake I did the last test in a hurry... and some parts of the system were unfortunately not recompiled by my IDE. So nothing was drawn because it was a horrible slow version, i had running before =) And yes it is double-buffered - No problem with removing the glFlush().

...and the performance characteristics have changed now 80 percent of runtime is spend in glDrawRangeElements - does that sound right ?

I am able to push 4000 glDrawRangeElements calls in about 5 fps, this time not fillrate limited - I moved the cubes far away and put only one cube in each buffer =), still not too good, I believe.

Quote:
I imagine there are a lot of bugs in your program and you need to spend a few days analyzing before you talk about performance issues.


Well, may be, but let me assure you, that I have written every part of it very carefully - I know every line and do lots of error checking. This is definetly not a nehe-type of project. Also I am a cs student in the last year, so actually my programming is hopefully not too bad, at least I know a thing or two.
Nevertheless there are unknows in the project as I never really pushed OpenGL before. Second it is a mixed-language project. C++/OpenGL for the very low-level part and java for everything else. Of course, there always may be errors but I already spent some time on this. The mistake with GL_INDEX_ARRAY may be the result of a thoughtless moment (it is confusing isn´t it) but that doesn´t apply to the project in general.

I am still not too happy... 1000 draw calls -> 20fps, so I will be restricted to the 300 suggested by NVIDIA, realistically I believe... is this good enough, or is there room for improvements ?

This topic is closed to new replies.

Advertisement