Some questions about VBOs and Vertex Arrays...

Started by
16 comments, last by ogracian 18 years, 10 months ago
Hello I am modifying my Quake III level viewer to add support for VBOs and it seems going fine, but after adding VBO support I really did not notice any speed improvement over my Vertex Arrays implementation even in some cases VBO version seems a little lower than my vertex arrays version. So after some deep debug of it (using the wonderful tool gDebugger), I found that when I use the VBO render path, some times my shader (Quake III shaders) code mix the VBO vertex data with Vertex Arrays data, I mean, some vertex data, like position comes from VBO while the color array comes from Vertex arrays. I show here a little example of how my vertex data arrays come: posArray ->VBO colorArray ->VertexArray or even some times comes from a glColor when color is constant tex0Array ->VBO tex1Array ->VertexArray So to close my question is, could this behavior could be causing my "low" VBO performance, shold I aim to use VBOs for all my arrays without data mixing? PS: I am using a geForce3 TI 500 with lastest NVIDIA drivers. Thanks in advance, Oscar
Advertisement
The best way to use VBOs is to work with moderately large INTERLEAVED memory buffer containing all your info.
That way your graphics card driver won't have to reach all over the place to grab primitives info bits as they're aligned right next to each other.
Also it helps a bit to use only static data as uploading elements over the AGP or PCI-E bus could seriously hurt the performance of your application.
PS: Let me know if you would like to know how to setup your intervleaved arrays for max performance :)
Hi, thanks for your replay.

So from your post, my current method is far from optimal : /. So should I avoid to mix vertex arrays data with vbo data (separate arrays)?

Well and about my current method, I need to use separate arrays for my vertex data, because as Quake III shaders are really flexible, my vertex data is really different from shader to shader; I mean, some shaders just need vertex pos array + constant color (sglColor), other need vertex pos array, multi texture coords, color array, etc. thats why I am using separate arrays to store my vertex data.

PS: I really appreciate if you could comment about your interleaved method.

Thanks in advance,
Oscar
bool Geometry::compile(){  if(VERTEX_FORMAT & RESIDE_ON_GFX_DEVICE)    return true;  Logger::writeInfoLog(String("Compiling geometry -> ") + name);  if(!GLEE_ARB_vertex_buffer_object)     return Logger::writeErrorLog("Failed: VBO extension not supported");  if(!getVertices())    return Logger::writeErrorLog("Failed: NULL vertices");  float      *GPUInterleavedArray   = NULL;  int         interleavedBufferSize = 0,             *GPUIndices            = NULL,              supported             = 0;  GLuint      vboID                 =    0,              indicesVBOID          =    0;   for(int i = 0; i < 8; i++)   {     supported = VERTEX_FORMAT & (TEXTURE0 << i);     if(supported)       interleavedBufferSize += getTextureElementsCount(i);     else       break;  }  if((VERTEX_FORMAT & VERTICES) && getVertices()) interleavedBufferSize += 3;  if((VERTEX_FORMAT & NORMALS ) && getNormals() ) interleavedBufferSize += 3;  if((VERTEX_FORMAT & COLOR   ) && getColors()  ) interleavedBufferSize += 3;  setStrideSize(interleavedBufferSize*4);  interleavedBufferSize *= getVerticesCount();  glGenBuffersARB(1, &vboID);  glBindBufferARB(GL_ARRAY_BUFFER_ARB, vboID);  glBufferDataARB(GL_ARRAY_BUFFER_ARB, interleavedBufferSize * sizeof(float),                  NULL, GL_STATIC_DRAW_ARB);    if(glGetError() == GL_NO_ERROR)    GPUInterleavedArray = (float *)glMapBufferARB(GL_ARRAY_BUFFER_ARB, GL_WRITE_ONLY_ARB);  else   return Logger::writeErrorLog("Not enough memory for the interleaved geometry arrays");  float  *texCoorArray  = NULL,         *vertexArray   = getVertices(),         *normalArray   = getNormals(),         *colorArray    = getColors();  int     texElemSize   = 0,          s;  for(int i = 0, t = 0; i < interleavedBufferSize; t++)  {    for(s = 0; s < 8; s++)    {      texElemSize  = getTextureElementsCount(s);      texCoorArray = getTextureCoords(s);      if(VERTEX_FORMAT & (TEXTURE0 << s))      {        switch(texElemSize)        {           case 1: GPUInterleavedArray[i++] = texCoorArray[t*1 + 0]; break;          case 2: GPUInterleavedArray[i++] = texCoorArray[t*2 + 0];                  GPUInterleavedArray[i++] = texCoorArray[t*2 + 1]; break;          case 3: GPUInterleavedArray[i++] = texCoorArray[t*3 + 0];                  GPUInterleavedArray[i++] = texCoorArray[t*3 + 1];                  GPUInterleavedArray[i++] = texCoorArray[t*3 + 2]; break;        }      }      else        break;    }    if((VERTEX_FORMAT & COLOR) && colorArray)    {      GPUInterleavedArray[i++] = colorArray[t*3 + 0];      GPUInterleavedArray[i++] = colorArray[t*3 + 1];      GPUInterleavedArray[i++] = colorArray[t*3 + 2];    }    if((VERTEX_FORMAT & NORMALS) && normalArray)    {      GPUInterleavedArray[i++] = normalArray[t*3 + 0];      GPUInterleavedArray[i++] = normalArray[t*3 + 1];      GPUInterleavedArray[i++] = normalArray[t*3 + 2];    }    if((VERTEX_FORMAT & VERTICES) && vertexArray)    {      GPUInterleavedArray[i++] = vertexArray[t*3 + 0];      GPUInterleavedArray[i++] = vertexArray[t*3 + 1];      GPUInterleavedArray[i++] = vertexArray[t*3 + 2];    }  }	glUnmapBufferARB(GL_ARRAY_BUFFER_ARB);	glBindBufferARB(GL_ARRAY_BUFFER_ARB, 0);    if(indices)  {    glGenBuffersARB(1, &indicesVBOID);    glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, indicesVBOID);    glBufferDataARB(GL_ELEMENT_ARRAY_BUFFER_ARB, getIndicesCount()*sizeof(int),                    NULL, GL_STATIC_DRAW_ARB);    if(glGetError() == GL_NO_ERROR)      GPUIndices = (int *)glMapBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, GL_WRITE_ONLY_ARB);    else      return Logger::writeErrorLog("Not enough memory for the indices");    memcpy(GPUIndices, getIndices(), getIndicesCount()*sizeof(int));    glUnmapBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB);    glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, 0);    setIndicesVBOID(indicesVBOID);  }  setElementsVBOID(vboID);   VERTEX_FORMAT |= RESIDE_ON_GFX_DEVICE;  return true;}

PS: I did not include support for neither color nor texture elements of size equal to four...Silly goose :D

[Edited by - JavaCoolDude on June 1, 2005 9:24:02 AM]
I was once told by a great comp sci. teacher: "Your program only runs as fast as it's slowest part." In this case (assuming that your bottleneck is vertex processing) yes, mixing VBOs and Vertex Arrays could slow you down. Everything that you are sending here can and should be converted to VBOs if at all possible if you really want to get the preformance boost out of it.

As for the interleaved arrays, here's how they would work:

When storing the vertex data initially, write it in system memory as such:

Position-Normal-Texture-Color-Position-Normal-Texture-Color-Position...

Then when setting up your gl*Pointer()'s, you don't specify an seperate array, but instead an offset into the interleaved one. For example, if we have the data arranged as above, you could do this:

vertexSize = sizeof(float)*12; //Used to specify the stride.glBindBufferARB( GL_ARRAY_BUFFER_ARB, myVBO );glNormalPointer(GL_FLOAT, vertexSize, (float *)NULL + 3);glTexCoordPointer(2, GL_FLOAT, vertexSize, (float *)NULL + 6);glColorPointer(4, GL_FLOAT, vertexSize, (float *)NULL + 8);glVertexPointer(3, GL_FLOAT, vertexSize, NULL); //NVidia reccomends you do this one last. ATI doesn't care.


And then render as normal. This is more effecient for the Graphics card, as it can naturally stream the data, and it makes storage easier, because you now have 1 VBO instead of 4.

Now, I've worked with Q3 maps before, so I know how crazy they can be, but it's certainly possible to get a reasonable Buffer-based rendering system out of it. (Heck, I was using D3D, which is even more rigid when it comes to this stuff.) For example, in the above code if you wanted to render the mesh without any color information, simply call glDisableClientState(GL_COLOR_ARRAY) before rendering, and glEnableClientState(GL_COLOR_ARRAY) afterwards. It will effectively ignore the color stream entirely, and you can override it with an immediate mode call (glColor4f) to give your entire mesh a single color.

This way you have all of your critical information in optimal VBO form, and can manipulate it at will to suit your shader needs.

EDIT: Whoops! JCD beat me to it! And his code is a lot more in depth too... :( Ah well, whatever helps!

[Edited by - Toji on June 1, 2005 10:32:13 AM]
// The user formerly known as Tojiro67445, formerly known as Toji [smile]
Correct me if I'm wrong Toji brother, but I've always thought the optimal approach to creating interleaved arrays was to store texture elments first, then colors and normals and finally your vertices.

It makes sense if you take a look at how the fixed glInterleavedArrays function expects its arguments, GL_T4F_C4F_N3F_V4F, that and the fact that the glVertexPointer should always be called last since it causes a major state change.

Come to think about it, isn't that why some folks use vertex attributes to store their data instead of the regular tex/col/nor/vert pointers? Hmmm something to investigate a litte more.
I'm not sure how much of a bearing the order has on it, but I think you may be right. In my own programs I store them as Textures, Normals, Verticies so I can simply call glInterleavedArrays() with GL_T2F_N3F_V3F and be done with it. (No color element, obviously.) I may have to experiment a bit and see if changing the data order has any effect. In any case, I think it would probably depend on the driver more than anything else.

I guess it never hurts to be on the safe side, though.
// The user formerly known as Tojiro67445, formerly known as Toji [smile]
Not to argue, but I seem to remember reading somewhere that the order doesnt matter, AS LONG AS glVertexPointer() or glAttribPointer() (with index 0) is called last (the actual order in the structure doesnt matter). (gotta go to school now ill see if i cant find a source for you when i get home)

hope that helps
-Dan
When General Patton died after World War 2 he went to the gates of Heaven to talk to St. Peter. The first thing he asked is if there were any Marines in heaven. St. Peter told him no, Marines are too rowdy for heaven. He then asked why Patton wanted to know. Patton told him he was sick of the Marines overshadowing the Army because they did more with less and were all hard-core sons of bitches. St. Peter reassured him there were no Marines so Patton went into Heaven. As he was checking out his new home he rounded a corner and saw someone in Marine Dress Blues. He ran back to St. Peter and yelled "You lied to me! There are Marines in heaven!" St. Peter said "Who him? That's just God. He wishes he were a Marine."
It probably used to matter, when the FFP was the only option, but it would be very strange if it mattered now that the GPU has become more generalised, a certain order of data wouldnt really make a great deal of sense in a world were you can generalise what the values passed in mean.

More important is the fact that the data is interleaved and that as vertices are drawn they have good spacial locality within ram, so that the GPU doesnt have to jump around memory to render (multiples of 32bytes is also a bonus here, although its not as important as the pre-T&L cache will handle that end of things).
Yeah, I've just run my code through several odd tests and it seems to make no difference what order your data is in, as long as you set the offsets correctly. Although I did notice something interesting: I got a small bump in speed when I used glInterleavedArrays() to set the pointer as opposed to gl*Pointer(). Not sure what would make the difference. (And no, I'm not calling a lot either. I was rendering the same mesh approx 5000 times, so it only got called once.)
// The user formerly known as Tojiro67445, formerly known as Toji [smile]

This topic is closed to new replies.

Advertisement