Performance test. Why VBO slower than display list?

Started by
4 comments, last by Peti29 12 years, 3 months ago
Advertisement
riruilo
Author
218
January 11, 2010 04:06 PM
Hi all! I'm doing a performance test (an exercise for the univerity) but I get some results that I don't understand. It is a toroide iimplemented with different techniques and FPS are these ones (in full screen): inmediate: 50 vertex array: 40 indexed vertex array: 60 VBO: 400 Display lists: 500 Why VBO is slower than Display List? Why Vertex Array is slower than Inmediate mode? BTW: I have an ATI 3870 and QuadCore 6600. Thanks a lot for help. Image Hosted by ImageShack.us
I've seen things you people wouldn't believe. Attack ships on fire off the shoulder of Orion. I watched C-beams glitter in the dark near the Tannhauser gate. All those moments will be lost in time, like tears in rain. Time to die.
Codeka
1,239
January 11, 2010 04:29 PM
Don't measure performance in FPS. If you convert your numbers to seconds, you'll see:

VBO: 0.0025s
Display lists: 0.002s

So it's a difference of half a millisecond. How many triangles are in the scene? It doesn't look like much, so there might be overhead in setting up the VBO that doesn't have as much effect for larger numbers of triangles.

You also don't describe how you set up the VBO. If you repopulate it every frame, then obviously it's not going to be very efficient (I'm not say you're doing that, but with no information, it's hard to guess where the performance is going)
riruilo
Author
218
January 11, 2010 04:53 PM
Quote:Original post by Codeka
Don't measure performance in FPS. If you convert your numbers to seconds, you'll see:

VBO: 0.0025s
Display lists: 0.002s

So it's a difference of half a millisecond. How many triangles are in the scene? It doesn't look like much, so there might be overhead in setting up the VBO that doesn't have as much effect for larger numbers of triangles.

You also don't describe how you set up the VBO. If you repopulate it every frame, then obviously it's not going to be very efficient (I'm not say you're doing that, but with no information, it's hard to guess where the performance is going)


Thanks for reply. This is my drawing code.
nIndicesPerStrip=10002
nStrips=80
Display list is not necessary.
Do you see the reason why vbo is slower than display list?
Thanks.

switch (mode) {	case IMMEDIATE:		{			unsigned int off=0;			for (unsigned int i=0; i< nStrips ; i++) {				if (!lightOn) glColor3fv(colors);				glBegin(GL_TRIANGLE_STRIP);				for (unsigned int j=0; j<nIndicesPerStrip ; j++) {					if (lightOn) glNormal3fv(&torus[indices_torus[off]].nx);					glVertex3fv(&torus[indices_torus[off]].x);					off++;				}				glEnd();			}			break;		}	case VERTEX_ARRAY:		{			if (lightOn) glEnableClientState(GL_NORMAL_ARRAY);			glEnableClientState(GL_VERTEX_ARRAY);			float* p=(float*)torus;			if (lightOn) glNormalPointer(GL_FLOAT, 6*sizeof(float), p);			glVertexPointer(3, GL_FLOAT, 6*sizeof(float), &p[3]);			for(unsigned int i=0;i<nStrips;i++) {				if (!lightOn) glColor3fv(colors);				glBegin(GL_TRIANGLE_STRIP);				for (unsigned int j=0; j<nIndicesPerStrip; j++)					glArrayElement(indices_torus[i*nIndicesPerStrip+j]);				glEnd();			}			if (lightOn) glDisableClientState(GL_NORMAL_ARRAY);			glDisableClientState(GL_VERTEX_ARRAY);			break;		}	case INDEXED_VERTEX_ARRAY:		{			if (lightOn) glEnableClientState(GL_NORMAL_ARRAY);			glEnableClientState(GL_VERTEX_ARRAY);			float* p=(float*)torus;			if (lightOn) glNormalPointer(GL_FLOAT, 6*sizeof(float), p);			glVertexPointer(3, GL_FLOAT, 6*sizeof(float), &p[3]);			for(unsigned int i=0;i<nStrips;i++) {				if (!lightOn) glColor3fv(colors);				glDrawElements(GL_TRIANGLE_STRIP, nIndicesPerStrip, GL_UNSIGNED_INT, &indices_torus[i*nIndicesPerStrip]);			}			if (lightOn) glDisableClientState(GL_NORMAL_ARRAY);			glDisableClientState(GL_VERTEX_ARRAY);			break;		}	case VBO:		{			glBindBufferARB(GL_ARRAY_BUFFER_ARB,m_vbo_vertices);			glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB,m_vbo_indices);			if (lightOn) glEnableClientState(GL_NORMAL_ARRAY);			glEnableClientState(GL_VERTEX_ARRAY);			if (lightOn) glNormalPointer(GL_FLOAT, 6*sizeof(float), 0);			glVertexPointer(3, GL_FLOAT, 6*sizeof(float), BUFFER_OFFSET(3*sizeof(float)));			for(unsigned int i=0;i<nStrips;i++) {				if (!lightOn) glColor3fv(colors);				glDrawElements(GL_TRIANGLE_STRIP, nIndicesPerStrip, GL_UNSIGNED_INT, BUFFER_OFFSET(i*nIndicesPerStrip*sizeof(unsigned int)));			}			if (lightOn) glEnableClientState(GL_NORMAL_ARRAY);			glDisableClientState(GL_VERTEX_ARRAY);			glBindBufferARB(GL_ARRAY_BUFFER_ARB, 0);			glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, 0);			break;		}
I've seen things you people wouldn't believe. Attack ships on fire off the shoulder of Orion. I watched C-beams glitter in the dark near the Tannhauser gate. All those moments will be lost in time, like tears in rain. Time to die.
BlackSeeds
January 12, 2010 05:46 AM
Its hard to know what exactly the display list does when you compile it, but it can make optimisations to your OpenGL calls, its extremely common for display lists to beat VBO's in terms of speed although only marginally...

The main thing is display lists are now deprecated, and dont give you much flexibility. Using dynmaic geometry in display list isnt suitible because of time taken to compile a DL, and hence during compilation any redundant OpenGL calls could be removed possibly the vertices could be indexed better and all sorts of other driver optimisations, in addition to that you may find perfromance of a DL varying from GPU to GPU because of the drivers and how the DL is compiled this a common scenario and many people have noticed this before.
V-man
January 12, 2010 08:50 AM
Perhaps you have to many calls to glDrawElements. Try to render the entire thing with 1 call. Use GL_TRIANGLES instead of strips.
Sig: http://glhlib.sourceforge.net
an open source GLU replacement library. Much more modern than GLU.
float matrix[16], inverse_matrix[16];
glhLoadIdentityf2(matrix);
glhTranslatef2(matrix, 0.0, 0.0, 5.0);
glhRotateAboutXf2(matrix, angleInRadians);
glhScalef2(matrix, 1.0, 1.0, -1.0);
glhQuickInvertMatrixf2(matrix, inverse_matrix);
glUniformMatrix4fv(uniformLocation1, 1, FALSE, matrix);
glUniformMatrix4fv(uniformLocation2, 1, FALSE, inverse_matrix);
Peti29
January 18, 2010 05:42 AM
Yes, I think the GL_TRIANGLES version could be more effective (it should utilize the vertex cache anyway).

But even then, it's most likely that the display list would still be faster. That's normal. But you should take a look on the memory consumption: DL eats like twice as much memory.

What I find strange is that the immediate mode was faster than the VA...
Share:

This topic is closed to new replies.

Advertisement