Performance test. Why VBO slower than display list?
Hi all!
I'm doing a performance test (an exercise for the univerity) but I get some results that I don't understand.
It is a toroide iimplemented with different techniques and FPS are these ones (in full screen):
inmediate: 50
vertex array: 40
indexed vertex array: 60
VBO: 400
Display lists: 500
Why VBO is slower than Display List?
Why Vertex Array is slower than Inmediate mode?
BTW: I have an ATI 3870 and QuadCore 6600.
Thanks a lot for help.
Don't measure performance in FPS. If you convert your numbers to seconds, you'll see:
VBO: 0.0025s
Display lists: 0.002s
So it's a difference of half a millisecond. How many triangles are in the scene? It doesn't look like much, so there might be overhead in setting up the VBO that doesn't have as much effect for larger numbers of triangles.
You also don't describe how you set up the VBO. If you repopulate it every frame, then obviously it's not going to be very efficient (I'm not say you're doing that, but with no information, it's hard to guess where the performance is going)
VBO: 0.0025s
Display lists: 0.002s
So it's a difference of half a millisecond. How many triangles are in the scene? It doesn't look like much, so there might be overhead in setting up the VBO that doesn't have as much effect for larger numbers of triangles.
You also don't describe how you set up the VBO. If you repopulate it every frame, then obviously it's not going to be very efficient (I'm not say you're doing that, but with no information, it's hard to guess where the performance is going)
Quote:Original post by Codeka
Don't measure performance in FPS. If you convert your numbers to seconds, you'll see:
VBO: 0.0025s
Display lists: 0.002s
So it's a difference of half a millisecond. How many triangles are in the scene? It doesn't look like much, so there might be overhead in setting up the VBO that doesn't have as much effect for larger numbers of triangles.
You also don't describe how you set up the VBO. If you repopulate it every frame, then obviously it's not going to be very efficient (I'm not say you're doing that, but with no information, it's hard to guess where the performance is going)
Thanks for reply. This is my drawing code.
nIndicesPerStrip=10002
nStrips=80
Display list is not necessary.
Do you see the reason why vbo is slower than display list?
Thanks.
switch (mode) { case IMMEDIATE: { unsigned int off=0; for (unsigned int i=0; i< nStrips ; i++) { if (!lightOn) glColor3fv(colors); glBegin(GL_TRIANGLE_STRIP); for (unsigned int j=0; j<nIndicesPerStrip ; j++) { if (lightOn) glNormal3fv(&torus[indices_torus[off]].nx); glVertex3fv(&torus[indices_torus[off]].x); off++; } glEnd(); } break; } case VERTEX_ARRAY: { if (lightOn) glEnableClientState(GL_NORMAL_ARRAY); glEnableClientState(GL_VERTEX_ARRAY); float* p=(float*)torus; if (lightOn) glNormalPointer(GL_FLOAT, 6*sizeof(float), p); glVertexPointer(3, GL_FLOAT, 6*sizeof(float), &p[3]); for(unsigned int i=0;i<nStrips;i++) { if (!lightOn) glColor3fv(colors); glBegin(GL_TRIANGLE_STRIP); for (unsigned int j=0; j<nIndicesPerStrip; j++) glArrayElement(indices_torus[i*nIndicesPerStrip+j]); glEnd(); } if (lightOn) glDisableClientState(GL_NORMAL_ARRAY); glDisableClientState(GL_VERTEX_ARRAY); break; } case INDEXED_VERTEX_ARRAY: { if (lightOn) glEnableClientState(GL_NORMAL_ARRAY); glEnableClientState(GL_VERTEX_ARRAY); float* p=(float*)torus; if (lightOn) glNormalPointer(GL_FLOAT, 6*sizeof(float), p); glVertexPointer(3, GL_FLOAT, 6*sizeof(float), &p[3]); for(unsigned int i=0;i<nStrips;i++) { if (!lightOn) glColor3fv(colors); glDrawElements(GL_TRIANGLE_STRIP, nIndicesPerStrip, GL_UNSIGNED_INT, &indices_torus[i*nIndicesPerStrip]); } if (lightOn) glDisableClientState(GL_NORMAL_ARRAY); glDisableClientState(GL_VERTEX_ARRAY); break; } case VBO: { glBindBufferARB(GL_ARRAY_BUFFER_ARB,m_vbo_vertices); glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB,m_vbo_indices); if (lightOn) glEnableClientState(GL_NORMAL_ARRAY); glEnableClientState(GL_VERTEX_ARRAY); if (lightOn) glNormalPointer(GL_FLOAT, 6*sizeof(float), 0); glVertexPointer(3, GL_FLOAT, 6*sizeof(float), BUFFER_OFFSET(3*sizeof(float))); for(unsigned int i=0;i<nStrips;i++) { if (!lightOn) glColor3fv(colors); glDrawElements(GL_TRIANGLE_STRIP, nIndicesPerStrip, GL_UNSIGNED_INT, BUFFER_OFFSET(i*nIndicesPerStrip*sizeof(unsigned int))); } if (lightOn) glEnableClientState(GL_NORMAL_ARRAY); glDisableClientState(GL_VERTEX_ARRAY); glBindBufferARB(GL_ARRAY_BUFFER_ARB, 0); glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, 0); break; }
Its hard to know what exactly the display list does when you compile it, but it can make optimisations to your OpenGL calls, its extremely common for display lists to beat VBO's in terms of speed although only marginally...
The main thing is display lists are now deprecated, and dont give you much flexibility. Using dynmaic geometry in display list isnt suitible because of time taken to compile a DL, and hence during compilation any redundant OpenGL calls could be removed possibly the vertices could be indexed better and all sorts of other driver optimisations, in addition to that you may find perfromance of a DL varying from GPU to GPU because of the drivers and how the DL is compiled this a common scenario and many people have noticed this before.
The main thing is display lists are now deprecated, and dont give you much flexibility. Using dynmaic geometry in display list isnt suitible because of time taken to compile a DL, and hence during compilation any redundant OpenGL calls could be removed possibly the vertices could be indexed better and all sorts of other driver optimisations, in addition to that you may find perfromance of a DL varying from GPU to GPU because of the drivers and how the DL is compiled this a common scenario and many people have noticed this before.
Perhaps you have to many calls to glDrawElements. Try to render the entire thing with 1 call. Use GL_TRIANGLES instead of strips.
Yes, I think the GL_TRIANGLES version could be more effective (it should utilize the vertex cache anyway).
But even then, it's most likely that the display list would still be faster. That's normal. But you should take a look on the memory consumption: DL eats like twice as much memory.
What I find strange is that the immediate mode was faster than the VA...
But even then, it's most likely that the display list would still be faster. That's normal. But you should take a look on the memory consumption: DL eats like twice as much memory.
What I find strange is that the immediate mode was faster than the VA...
Share:
This topic is closed to new replies.
Advertisement
Advertisement
