Some questions about VBOs and Vertex Arrays...
Hello
I am modifying my Quake III level viewer to add support for VBOs and it seems going fine, but after adding VBO support I really did not notice any speed improvement over my Vertex Arrays implementation even in some cases VBO version
seems a little lower than my vertex arrays version.
So after some deep debug of it (using the wonderful tool gDebugger), I found that when I use the VBO render path, some times my shader (Quake III shaders) code mix the VBO vertex data with Vertex Arrays data, I mean, some vertex data, like position comes from VBO while the color array comes from Vertex arrays.
I show here a little example of how my vertex data arrays come:
posArray ->VBO
colorArray ->VertexArray or even some times comes from a glColor when color
is constant
tex0Array ->VBO
tex1Array ->VertexArray
So to close my question is, could this behavior could be causing my "low" VBO performance, shold I aim to use VBOs for all my arrays without data mixing?
PS: I am using a geForce3 TI 500 with lastest NVIDIA drivers.
Thanks in advance,
Oscar
The best way to use VBOs is to work with moderately large INTERLEAVED memory buffer containing all your info.
That way your graphics card driver won't have to reach all over the place to grab primitives info bits as they're aligned right next to each other.
Also it helps a bit to use only static data as uploading elements over the AGP or PCI-E bus could seriously hurt the performance of your application.
PS: Let me know if you would like to know how to setup your intervleaved arrays for max performance :)
That way your graphics card driver won't have to reach all over the place to grab primitives info bits as they're aligned right next to each other.
Also it helps a bit to use only static data as uploading elements over the AGP or PCI-E bus could seriously hurt the performance of your application.
PS: Let me know if you would like to know how to setup your intervleaved arrays for max performance :)
Hi, thanks for your replay.
So from your post, my current method is far from optimal : /. So should I avoid to mix vertex arrays data with vbo data (separate arrays)?
Well and about my current method, I need to use separate arrays for my vertex data, because as Quake III shaders are really flexible, my vertex data is really different from shader to shader; I mean, some shaders just need vertex pos array + constant color (sglColor), other need vertex pos array, multi texture coords, color array, etc. thats why I am using separate arrays to store my vertex data.
PS: I really appreciate if you could comment about your interleaved method.
Thanks in advance,
Oscar
So from your post, my current method is far from optimal : /. So should I avoid to mix vertex arrays data with vbo data (separate arrays)?
Well and about my current method, I need to use separate arrays for my vertex data, because as Quake III shaders are really flexible, my vertex data is really different from shader to shader; I mean, some shaders just need vertex pos array + constant color (sglColor), other need vertex pos array, multi texture coords, color array, etc. thats why I am using separate arrays to store my vertex data.
PS: I really appreciate if you could comment about your interleaved method.
Thanks in advance,
Oscar
bool Geometry::compile(){ if(VERTEX_FORMAT & RESIDE_ON_GFX_DEVICE) return true; Logger::writeInfoLog(String("Compiling geometry -> ") + name); if(!GLEE_ARB_vertex_buffer_object) return Logger::writeErrorLog("Failed: VBO extension not supported"); if(!getVertices()) return Logger::writeErrorLog("Failed: NULL vertices"); float *GPUInterleavedArray = NULL; int interleavedBufferSize = 0, *GPUIndices = NULL, supported = 0; GLuint vboID = 0, indicesVBOID = 0; for(int i = 0; i < 8; i++) { supported = VERTEX_FORMAT & (TEXTURE0 << i); if(supported) interleavedBufferSize += getTextureElementsCount(i); else break; } if((VERTEX_FORMAT & VERTICES) && getVertices()) interleavedBufferSize += 3; if((VERTEX_FORMAT & NORMALS ) && getNormals() ) interleavedBufferSize += 3; if((VERTEX_FORMAT & COLOR ) && getColors() ) interleavedBufferSize += 3; setStrideSize(interleavedBufferSize*4); interleavedBufferSize *= getVerticesCount(); glGenBuffersARB(1, &vboID); glBindBufferARB(GL_ARRAY_BUFFER_ARB, vboID); glBufferDataARB(GL_ARRAY_BUFFER_ARB, interleavedBufferSize * sizeof(float), NULL, GL_STATIC_DRAW_ARB); if(glGetError() == GL_NO_ERROR) GPUInterleavedArray = (float *)glMapBufferARB(GL_ARRAY_BUFFER_ARB, GL_WRITE_ONLY_ARB); else return Logger::writeErrorLog("Not enough memory for the interleaved geometry arrays"); float *texCoorArray = NULL, *vertexArray = getVertices(), *normalArray = getNormals(), *colorArray = getColors(); int texElemSize = 0, s; for(int i = 0, t = 0; i < interleavedBufferSize; t++) { for(s = 0; s < 8; s++) { texElemSize = getTextureElementsCount(s); texCoorArray = getTextureCoords(s); if(VERTEX_FORMAT & (TEXTURE0 << s)) { switch(texElemSize) { case 1: GPUInterleavedArray[i++] = texCoorArray[t*1 + 0]; break; case 2: GPUInterleavedArray[i++] = texCoorArray[t*2 + 0]; GPUInterleavedArray[i++] = texCoorArray[t*2 + 1]; break; case 3: GPUInterleavedArray[i++] = texCoorArray[t*3 + 0]; GPUInterleavedArray[i++] = texCoorArray[t*3 + 1]; GPUInterleavedArray[i++] = texCoorArray[t*3 + 2]; break; } } else break; } if((VERTEX_FORMAT & COLOR) && colorArray) { GPUInterleavedArray[i++] = colorArray[t*3 + 0]; GPUInterleavedArray[i++] = colorArray[t*3 + 1]; GPUInterleavedArray[i++] = colorArray[t*3 + 2]; } if((VERTEX_FORMAT & NORMALS) && normalArray) { GPUInterleavedArray[i++] = normalArray[t*3 + 0]; GPUInterleavedArray[i++] = normalArray[t*3 + 1]; GPUInterleavedArray[i++] = normalArray[t*3 + 2]; } if((VERTEX_FORMAT & VERTICES) && vertexArray) { GPUInterleavedArray[i++] = vertexArray[t*3 + 0]; GPUInterleavedArray[i++] = vertexArray[t*3 + 1]; GPUInterleavedArray[i++] = vertexArray[t*3 + 2]; } } glUnmapBufferARB(GL_ARRAY_BUFFER_ARB); glBindBufferARB(GL_ARRAY_BUFFER_ARB, 0); if(indices) { glGenBuffersARB(1, &indicesVBOID); glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, indicesVBOID); glBufferDataARB(GL_ELEMENT_ARRAY_BUFFER_ARB, getIndicesCount()*sizeof(int), NULL, GL_STATIC_DRAW_ARB); if(glGetError() == GL_NO_ERROR) GPUIndices = (int *)glMapBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, GL_WRITE_ONLY_ARB); else return Logger::writeErrorLog("Not enough memory for the indices"); memcpy(GPUIndices, getIndices(), getIndicesCount()*sizeof(int)); glUnmapBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB); glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, 0); setIndicesVBOID(indicesVBOID); } setElementsVBOID(vboID); VERTEX_FORMAT |= RESIDE_ON_GFX_DEVICE; return true;}
PS: I did not include support for neither color nor texture elements of size equal to four...Silly goose :D
[Edited by - JavaCoolDude on June 1, 2005 9:24:02 AM]
I was once told by a great comp sci. teacher: "Your program only runs as fast as it's slowest part." In this case (assuming that your bottleneck is vertex processing) yes, mixing VBOs and Vertex Arrays could slow you down. Everything that you are sending here can and should be converted to VBOs if at all possible if you really want to get the preformance boost out of it.
As for the interleaved arrays, here's how they would work:
When storing the vertex data initially, write it in system memory as such:
Position-Normal-Texture-Color-Position-Normal-Texture-Color-Position...
Then when setting up your gl*Pointer()'s, you don't specify an seperate array, but instead an offset into the interleaved one. For example, if we have the data arranged as above, you could do this:
And then render as normal. This is more effecient for the Graphics card, as it can naturally stream the data, and it makes storage easier, because you now have 1 VBO instead of 4.
Now, I've worked with Q3 maps before, so I know how crazy they can be, but it's certainly possible to get a reasonable Buffer-based rendering system out of it. (Heck, I was using D3D, which is even more rigid when it comes to this stuff.) For example, in the above code if you wanted to render the mesh without any color information, simply call glDisableClientState(GL_COLOR_ARRAY) before rendering, and glEnableClientState(GL_COLOR_ARRAY) afterwards. It will effectively ignore the color stream entirely, and you can override it with an immediate mode call (glColor4f) to give your entire mesh a single color.
This way you have all of your critical information in optimal VBO form, and can manipulate it at will to suit your shader needs.
EDIT: Whoops! JCD beat me to it! And his code is a lot more in depth too... :( Ah well, whatever helps!
[Edited by - Toji on June 1, 2005 10:32:13 AM]
As for the interleaved arrays, here's how they would work:
When storing the vertex data initially, write it in system memory as such:
Position-Normal-Texture-Color-Position-Normal-Texture-Color-Position...
Then when setting up your gl*Pointer()'s, you don't specify an seperate array, but instead an offset into the interleaved one. For example, if we have the data arranged as above, you could do this:
vertexSize = sizeof(float)*12; //Used to specify the stride.glBindBufferARB( GL_ARRAY_BUFFER_ARB, myVBO );glNormalPointer(GL_FLOAT, vertexSize, (float *)NULL + 3);glTexCoordPointer(2, GL_FLOAT, vertexSize, (float *)NULL + 6);glColorPointer(4, GL_FLOAT, vertexSize, (float *)NULL + 8);glVertexPointer(3, GL_FLOAT, vertexSize, NULL); //NVidia reccomends you do this one last. ATI doesn't care.
And then render as normal. This is more effecient for the Graphics card, as it can naturally stream the data, and it makes storage easier, because you now have 1 VBO instead of 4.
Now, I've worked with Q3 maps before, so I know how crazy they can be, but it's certainly possible to get a reasonable Buffer-based rendering system out of it. (Heck, I was using D3D, which is even more rigid when it comes to this stuff.) For example, in the above code if you wanted to render the mesh without any color information, simply call glDisableClientState(GL_COLOR_ARRAY) before rendering, and glEnableClientState(GL_COLOR_ARRAY) afterwards. It will effectively ignore the color stream entirely, and you can override it with an immediate mode call (glColor4f) to give your entire mesh a single color.
This way you have all of your critical information in optimal VBO form, and can manipulate it at will to suit your shader needs.
EDIT: Whoops! JCD beat me to it! And his code is a lot more in depth too... :( Ah well, whatever helps!
[Edited by - Toji on June 1, 2005 10:32:13 AM]
Correct me if I'm wrong Toji brother, but I've always thought the optimal approach to creating interleaved arrays was to store texture elments first, then colors and normals and finally your vertices.
It makes sense if you take a look at how the fixed glInterleavedArrays function expects its arguments, GL_T4F_C4F_N3F_V4F, that and the fact that the glVertexPointer should always be called last since it causes a major state change.
Come to think about it, isn't that why some folks use vertex attributes to store their data instead of the regular tex/col/nor/vert pointers? Hmmm something to investigate a litte more.
It makes sense if you take a look at how the fixed glInterleavedArrays function expects its arguments, GL_T4F_C4F_N3F_V4F, that and the fact that the glVertexPointer should always be called last since it causes a major state change.
Come to think about it, isn't that why some folks use vertex attributes to store their data instead of the regular tex/col/nor/vert pointers? Hmmm something to investigate a litte more.
I'm not sure how much of a bearing the order has on it, but I think you may be right. In my own programs I store them as Textures, Normals, Verticies so I can simply call glInterleavedArrays() with GL_T2F_N3F_V3F and be done with it. (No color element, obviously.) I may have to experiment a bit and see if changing the data order has any effect. In any case, I think it would probably depend on the driver more than anything else.
I guess it never hurts to be on the safe side, though.
I guess it never hurts to be on the safe side, though.
Not to argue, but I seem to remember reading somewhere that the order doesnt matter, AS LONG AS glVertexPointer() or glAttribPointer() (with index 0) is called last (the actual order in the structure doesnt matter). (gotta go to school now ill see if i cant find a source for you when i get home)
hope that helps
-Dan
hope that helps
-Dan
It probably used to matter, when the FFP was the only option, but it would be very strange if it mattered now that the GPU has become more generalised, a certain order of data wouldnt really make a great deal of sense in a world were you can generalise what the values passed in mean.
More important is the fact that the data is interleaved and that as vertices are drawn they have good spacial locality within ram, so that the GPU doesnt have to jump around memory to render (multiples of 32bytes is also a bonus here, although its not as important as the pre-T&L cache will handle that end of things).
More important is the fact that the data is interleaved and that as vertices are drawn they have good spacial locality within ram, so that the GPU doesnt have to jump around memory to render (multiples of 32bytes is also a bonus here, although its not as important as the pre-T&L cache will handle that end of things).
Yeah, I've just run my code through several odd tests and it seems to make no difference what order your data is in, as long as you set the offsets correctly. Although I did notice something interesting: I got a small bump in speed when I used glInterleavedArrays() to set the pointer as opposed to gl*Pointer(). Not sure what would make the difference. (And no, I'm not calling a lot either. I was rendering the same mesh approx 5000 times, so it only got called once.)
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement