• Advertisement
Sign in to follow this  

Some questions about VBOs and Vertex Arrays...

This topic is 4615 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello I am modifying my Quake III level viewer to add support for VBOs and it seems going fine, but after adding VBO support I really did not notice any speed improvement over my Vertex Arrays implementation even in some cases VBO version seems a little lower than my vertex arrays version. So after some deep debug of it (using the wonderful tool gDebugger), I found that when I use the VBO render path, some times my shader (Quake III shaders) code mix the VBO vertex data with Vertex Arrays data, I mean, some vertex data, like position comes from VBO while the color array comes from Vertex arrays. I show here a little example of how my vertex data arrays come: posArray ->VBO colorArray ->VertexArray or even some times comes from a glColor when color is constant tex0Array ->VBO tex1Array ->VertexArray So to close my question is, could this behavior could be causing my "low" VBO performance, shold I aim to use VBOs for all my arrays without data mixing? PS: I am using a geForce3 TI 500 with lastest NVIDIA drivers. Thanks in advance, Oscar

Share this post


Link to post
Share on other sites
Advertisement
The best way to use VBOs is to work with moderately large INTERLEAVED memory buffer containing all your info.
That way your graphics card driver won't have to reach all over the place to grab primitives info bits as they're aligned right next to each other.
Also it helps a bit to use only static data as uploading elements over the AGP or PCI-E bus could seriously hurt the performance of your application.
PS: Let me know if you would like to know how to setup your intervleaved arrays for max performance :)

Share this post


Link to post
Share on other sites
Hi, thanks for your replay.

So from your post, my current method is far from optimal : /. So should I avoid to mix vertex arrays data with vbo data (separate arrays)?

Well and about my current method, I need to use separate arrays for my vertex data, because as Quake III shaders are really flexible, my vertex data is really different from shader to shader; I mean, some shaders just need vertex pos array + constant color (sglColor), other need vertex pos array, multi texture coords, color array, etc. thats why I am using separate arrays to store my vertex data.

PS: I really appreciate if you could comment about your interleaved method.

Thanks in advance,
Oscar

Share this post


Link to post
Share on other sites

bool Geometry::compile()
{
if(VERTEX_FORMAT & RESIDE_ON_GFX_DEVICE)
return true;

Logger::writeInfoLog(String("Compiling geometry -> ") + name);

if(!GLEE_ARB_vertex_buffer_object)
return Logger::writeErrorLog("Failed: VBO extension not supported");

if(!getVertices())
return Logger::writeErrorLog("Failed: NULL vertices");

float *GPUInterleavedArray = NULL;
int interleavedBufferSize = 0,
*GPUIndices = NULL,
supported = 0;

GLuint vboID = 0,
indicesVBOID = 0;

for(int i = 0; i < 8; i++)
{
supported = VERTEX_FORMAT & (TEXTURE0 << i);
if(supported)
interleavedBufferSize += getTextureElementsCount(i);
else
break;
}

if((VERTEX_FORMAT & VERTICES) && getVertices()) interleavedBufferSize += 3;
if((VERTEX_FORMAT & NORMALS ) && getNormals() ) interleavedBufferSize += 3;
if((VERTEX_FORMAT & COLOR ) && getColors() ) interleavedBufferSize += 3;

setStrideSize(interleavedBufferSize*4);

interleavedBufferSize *= getVerticesCount();

glGenBuffersARB(1, &vboID);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, vboID);
glBufferDataARB(GL_ARRAY_BUFFER_ARB, interleavedBufferSize * sizeof(float),
NULL, GL_STATIC_DRAW_ARB);


if(glGetError() == GL_NO_ERROR)
GPUInterleavedArray = (float *)glMapBufferARB(GL_ARRAY_BUFFER_ARB, GL_WRITE_ONLY_ARB);
else
return Logger::writeErrorLog("Not enough memory for the interleaved geometry arrays");

float *texCoorArray = NULL,
*vertexArray = getVertices(),
*normalArray = getNormals(),
*colorArray = getColors();
int texElemSize = 0,
s;

for(int i = 0, t = 0; i < interleavedBufferSize; t++)
{
for(s = 0; s < 8; s++)
{
texElemSize = getTextureElementsCount(s);
texCoorArray = getTextureCoords(s);

if(VERTEX_FORMAT & (TEXTURE0 << s))
{
switch(texElemSize)
{

case 1: GPUInterleavedArray[i++] = texCoorArray[t*1 + 0]; break;
case 2: GPUInterleavedArray[i++] = texCoorArray[t*2 + 0];
GPUInterleavedArray[i++] = texCoorArray[t*2 + 1]; break;
case 3: GPUInterleavedArray[i++] = texCoorArray[t*3 + 0];
GPUInterleavedArray[i++] = texCoorArray[t*3 + 1];
GPUInterleavedArray[i++] = texCoorArray[t*3 + 2]; break;
}
}
else
break;
}

if((VERTEX_FORMAT & COLOR) && colorArray)
{
GPUInterleavedArray[i++] = colorArray[t*3 + 0];
GPUInterleavedArray[i++] = colorArray[t*3 + 1];
GPUInterleavedArray[i++] = colorArray[t*3 + 2];
}

if((VERTEX_FORMAT & NORMALS) && normalArray)
{
GPUInterleavedArray[i++] = normalArray[t*3 + 0];
GPUInterleavedArray[i++] = normalArray[t*3 + 1];
GPUInterleavedArray[i++] = normalArray[t*3 + 2];
}

if((VERTEX_FORMAT & VERTICES) && vertexArray)
{
GPUInterleavedArray[i++] = vertexArray[t*3 + 0];
GPUInterleavedArray[i++] = vertexArray[t*3 + 1];
GPUInterleavedArray[i++] = vertexArray[t*3 + 2];
}
}

glUnmapBufferARB(GL_ARRAY_BUFFER_ARB);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, 0);

if(indices)
{
glGenBuffersARB(1, &indicesVBOID);
glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, indicesVBOID);
glBufferDataARB(GL_ELEMENT_ARRAY_BUFFER_ARB, getIndicesCount()*sizeof(int),
NULL, GL_STATIC_DRAW_ARB);

if(glGetError() == GL_NO_ERROR)
GPUIndices = (int *)glMapBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, GL_WRITE_ONLY_ARB);
else
return Logger::writeErrorLog("Not enough memory for the indices");

memcpy(GPUIndices, getIndices(), getIndicesCount()*sizeof(int));

glUnmapBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB);
glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, 0);
setIndicesVBOID(indicesVBOID);
}

setElementsVBOID(vboID);

VERTEX_FORMAT |= RESIDE_ON_GFX_DEVICE;

return true;
}


PS: I did not include support for neither color nor texture elements of size equal to four...Silly goose :D

[Edited by - JavaCoolDude on June 1, 2005 9:24:02 AM]

Share this post


Link to post
Share on other sites
I was once told by a great comp sci. teacher: "Your program only runs as fast as it's slowest part." In this case (assuming that your bottleneck is vertex processing) yes, mixing VBOs and Vertex Arrays could slow you down. Everything that you are sending here can and should be converted to VBOs if at all possible if you really want to get the preformance boost out of it.

As for the interleaved arrays, here's how they would work:

When storing the vertex data initially, write it in system memory as such:

Position-Normal-Texture-Color-Position-Normal-Texture-Color-Position...

Then when setting up your gl*Pointer()'s, you don't specify an seperate array, but instead an offset into the interleaved one. For example, if we have the data arranged as above, you could do this:


vertexSize = sizeof(float)*12; //Used to specify the stride.

glBindBufferARB( GL_ARRAY_BUFFER_ARB, myVBO );

glNormalPointer(GL_FLOAT, vertexSize, (float *)NULL + 3);
glTexCoordPointer(2, GL_FLOAT, vertexSize, (float *)NULL + 6);
glColorPointer(4, GL_FLOAT, vertexSize, (float *)NULL + 8);
glVertexPointer(3, GL_FLOAT, vertexSize, NULL); //NVidia reccomends you do this one last. ATI doesn't care.





And then render as normal. This is more effecient for the Graphics card, as it can naturally stream the data, and it makes storage easier, because you now have 1 VBO instead of 4.

Now, I've worked with Q3 maps before, so I know how crazy they can be, but it's certainly possible to get a reasonable Buffer-based rendering system out of it. (Heck, I was using D3D, which is even more rigid when it comes to this stuff.) For example, in the above code if you wanted to render the mesh without any color information, simply call glDisableClientState(GL_COLOR_ARRAY) before rendering, and glEnableClientState(GL_COLOR_ARRAY) afterwards. It will effectively ignore the color stream entirely, and you can override it with an immediate mode call (glColor4f) to give your entire mesh a single color.

This way you have all of your critical information in optimal VBO form, and can manipulate it at will to suit your shader needs.

EDIT: Whoops! JCD beat me to it! And his code is a lot more in depth too... :( Ah well, whatever helps!

[Edited by - Toji on June 1, 2005 10:32:13 AM]

Share this post


Link to post
Share on other sites
Correct me if I'm wrong Toji brother, but I've always thought the optimal approach to creating interleaved arrays was to store texture elments first, then colors and normals and finally your vertices.

It makes sense if you take a look at how the fixed glInterleavedArrays function expects its arguments, GL_T4F_C4F_N3F_V4F, that and the fact that the glVertexPointer should always be called last since it causes a major state change.

Come to think about it, isn't that why some folks use vertex attributes to store their data instead of the regular tex/col/nor/vert pointers? Hmmm something to investigate a litte more.

Share this post


Link to post
Share on other sites
I'm not sure how much of a bearing the order has on it, but I think you may be right. In my own programs I store them as Textures, Normals, Verticies so I can simply call glInterleavedArrays() with GL_T2F_N3F_V3F and be done with it. (No color element, obviously.) I may have to experiment a bit and see if changing the data order has any effect. In any case, I think it would probably depend on the driver more than anything else.

I guess it never hurts to be on the safe side, though.

Share this post


Link to post
Share on other sites
Not to argue, but I seem to remember reading somewhere that the order doesnt matter, AS LONG AS glVertexPointer() or glAttribPointer() (with index 0) is called last (the actual order in the structure doesnt matter). (gotta go to school now ill see if i cant find a source for you when i get home)

hope that helps
-Dan

Share this post


Link to post
Share on other sites
It probably used to matter, when the FFP was the only option, but it would be very strange if it mattered now that the GPU has become more generalised, a certain order of data wouldnt really make a great deal of sense in a world were you can generalise what the values passed in mean.

More important is the fact that the data is interleaved and that as vertices are drawn they have good spacial locality within ram, so that the GPU doesnt have to jump around memory to render (multiples of 32bytes is also a bonus here, although its not as important as the pre-T&L cache will handle that end of things).

Share this post


Link to post
Share on other sites
Yeah, I've just run my code through several odd tests and it seems to make no difference what order your data is in, as long as you set the offsets correctly. Although I did notice something interesting: I got a small bump in speed when I used glInterleavedArrays() to set the pointer as opposed to gl*Pointer(). Not sure what would make the difference. (And no, I'm not calling a lot either. I was rendering the same mesh approx 5000 times, so it only got called once.)

Share this post


Link to post
Share on other sites
Its probably just some internal optimisation in the driver as it knows about the vertex stream you'll be processing.
However, unless you are using something which properly mirrors a vertex format its not a great deal of use [grin]

Share this post


Link to post
Share on other sites
Unfortunately glInterleavedArrays is not flexible enough for the stuff I do but then again performance gain would be negligible.
[Offtopic] Yay for 1800 rate points my buddy Phantom [/Offtopic]

Share this post


Link to post
Share on other sites
Hello,

Thank you people for your help, and based on your post I noticed that in my code I am calling glVertexPointer() FIRST, which as you say is a bad thing (I
need to test moving it to last).

And about interleaved data, let me see if I got it, so I pack all my models data in a single VBO using a method like JavaCoolDude show me here.

Then to render, when I set my gl*Pointer functions instead of call it with the array pointer, I call it with the OFFSET in the VBO where my data lives, right?

Last but no last : ), so using this method, when I need to send a mesh to shaders with just Position + TexCoord0, use the same aproach just disabling the
color part using glDisableClientState(GL_COLOR_ARRAY) before I render my mesh:

So to close, the order to use it is as follows:

- Pack ALL my model vertex data in a single VBO (like JavaCoolDude show here)
- Bind the VBO, set all gl*Pointers using offset instead pointer, calling
glVertexPoiner last to avoid NVIDIA troubles. Ex:


vertexSize = sizeof(float)*12;
glBindBufferARB( GL_ARRAY_BUFFER_ARB, myVBO );

glNormalPointer(GL_FLOAT, vertexSize, (float *)NULL + 3);
glTexCoordPointer(2, GL_FLOAT, vertexSize, (float *)NULL + 6);
glColorPointer(4, GL_FLOAT, vertexSize, (float *)NULL + 8);
glVertexPointer(3, GL_FLOAT, vertexSize, NULL);



- Disable any vertex data that is not needed by shader, like color, using
glDisableClientState befor I render it.

- Render my mesh using glDrawRange elements.


Thanks in advace,
Oscar

Share this post


Link to post
Share on other sites
Sounds like you've got it down pat! One thing that I want to mention, though: Most of the documents I've seen don't treat it as such, but under windows you need to register glDrawRangeElements as an extension (It actually comes through in my program as "glDrawRangeElementsEXT").

Also, just a quick blurb from the nVidia VBO whitepaper, to help explain the glVertexPointer thing:

Quote:
Avoid Calling glVertexPointer() more than once per VBO
The glVertexPointer function does a lot of setup in VBO, so to avoid
redundancy.
The most efficient way to do is to bind the VBO buffer, setup various array pointers (glNormalPointer etc) and then call glVertexPointer(). glVertexPointer should be called one time for one VBO.
You might think the essentials of VBO management are done in glBindBufferARB(), but its the opposite. VBO systems wait for the next upcoming important function (like glVertexPointer).
The binding operation is cheap compared to the setup of various pointers.
This advice fits any other function working in the same manner as glVertexPointer().


Good luck, and let us know who it works out for ya!

Share this post


Link to post
Share on other sites
Cool!, I really appreciate all your help here, now just have to implement it : )

Ah! before I forget to ask it, this method looks preatty stright forward but for STATIC data, but what about DYNAMIC data, how can it be handle if all is
packed in same VBO?

For example, in my current code, some times I need to modify the color array
using a wave function, so the way I modify it is filling a temporal array (sctarch buffer) with the new color data, then I use the following code to
build a new VBO for the modified data:


sglBindBufferARB(GL_ARRAY_BUFFER_ARB, newVBO);
sglBufferDataARB(GL_ARRAY_BUFFER_ARB, numVerts * sizeof(byte) * 4, newColorArray, GL_STREAM_DRAW_ARB);



So as you can see this method dose not seems to work with the packed data
because I will need to rebuild the entire VBO again, maybe I could "Map" the buffer but I really not sure how to handle it.

Regards,
Oscar

Share this post


Link to post
Share on other sites
You may want to look into mapping, yes, but I'm afraid I've never used it so I wont be of much help. It essentially gives you a short term pointer to the array for easy editing, but I doubt that will help with extracting the interleaved data.

I guess this is where functionality and speed begin to butt heads. As far as I can see the easiest way to keep things interleaved would be to keep the original data handy, rebuild the interleaved array, and upload it to the card every frame. Not pretty >_<

What may be a better, if ever so slightly slower option would be to interleave the static data (vertex, normal, etc) and then place any data that you need to update frequently into a straight array appended at the end of the buffer (use the SubBufferData function. It's slower than a basic BufferData, but I think it would be faster than a constant rebuilding of the entire buffer). So in memory your buffer now looks like this:

|------Vertex/Normal/Texture--------|--Color Data--|

Memory pointers still work the same, just with a different offset and for the color data the stride would be 0. It's not going to be quite as fast as a fully interleaved buffer, but at least everything is still in VBO form.

If anyone has a better suggestion I'd love to hear it!

EDIT: @Ademan - Heh, I guess I should have been a little more clear about that. You definately would want to specify the buffer as dynamic whenever you're going to be changing any of the data in it with any regularity. How much that helps, though, is a simple matter of driver implementation. The drivers can completely ignore your usage hints if they want to. Still, thanks for pointing that out Ademan.

[Edited by - Toji on June 1, 2005 9:00:54 PM]

Share this post


Link to post
Share on other sites
Hrm, i like your idea there Toji, but unfortunately i dont think that would be practical, because if both are stored in the same vertex buffer, then you would be bound by the usage of the buffer, which would be sub-optimal for one of the vertex attributes, if you were to keep everything in a dynamic or streaming VBO, that would eliminate the point of having them in a VBO at all (well, most likely it would, what the driver does with your buffer is actually up to it) since you're moving to a VBO so that you can keep the static data server side (card) rather than client side. But if you kept everything in a static VBO the dynamic data's access might be so slow it would negate all advantages. (the best way to find out however, would be to try it)

I think either what toji said (with a dynamic or static draw or copy VBO (my moneys on static, provided you use a buffer sub data)
OR
2 VBOs, static data in one (obviously with a static usage) and a second dynamic VBO (dynamic or stream usage, this could also be replaced by just a vertex array if you ask me)

hope that helps
-Dan

Share this post


Link to post
Share on other sites
Hi,

I really appreciate all your help!, I learned a lot about it, and about the dynamic data management, in my current implementation (separate VBOs), I use a similar aproach to what Ademan555 says, I mean, I use a second VBO with
GL_STREAM_DRAW_ARB usage, then I use sglBufferDataARB to load my modified data on it.

I will try both methods (Toji and Ademan555) and see how thing go : )

Best Regards,
Oscar

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement