Sign in to follow this  

On indexed VBO performance...

This topic is 4856 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi all, I have a couple questions about using indexed Vertex Buffer Objects.. Specifically, whether I should add offsets to indices before using them as I do now, or use them as-is and just rebind the vertex pointer with a different offset for each group of indices. It's a slightly more complicated but I'll need a little background: So, a few weeks ago I started working on a geomipmapping project. It has been a learning experience to say the least [rolleyes] Now I'm not trying to say NeHe hasn't been a very useful site for learning how to do a new technique in a quick-and-dirty fashion, but some of the stuff on there is misleading or just plain wrong. What got me off on the wrong track with VBOs is that, when explaining how to set the vertex pointer to use a bound VBO, the author says something like "...instead of providing a pointer to the data, we bind the VBO we want and set the pointer to zero", which completely fails to convey that the intended purpose of the last parameter is as an offset into the VBO itself. Also would have been nice to know that I could map pointers to the VBO memory itself.. *sigh* that's what I get for not just reading the extensions registry docs in the first place.. Anyway, so this limited perspective on VBOs led me to a kind of backwards solution.. What I have now: *A static array of vertices generated from a heightmap (heightmap sizes range from 1025 to 4097 square). These are specified once at application start and are sent into a VBO created using the STATIC_DRAW_ARB flag. I don't touch them after they're uploaded. *A static array of unsigned int indices that is the size of one mesh (currently 32x32 seems to yield best performance), for each level of detail. These stay in system memory currently. *A dynamic index array, per mesh, in system memory (unsigned ints as well). These represent the 'current' indices per mesh, and are updated if a mesh's LOD is updated. *An unsigned integer offset per mesh which gives the index to the first vertice used by a particular mesh. How it works right now: When a mesh's level of detail changes, it copies the correct indices for its new LOD from the static indices which are global to the mesh object. Since the indices are the same for each mesh except that they start from a different offset, I use this to save a lot of memory by dynamically adding the mesh's offset to the static indices as I copy them to the mesh. Cracks are fixed with index twiddling (fixing cracks on possibly stale indices and the bugs I needed to deal with for that were... fun) Since LOD updates aren't too frequent (performed on an as-needed basis only when the camera moves a certain distance or an object becomes newly visible, only for visible objects) it didn't seem like too bad a hit to just recopy the current indices from the static index array each LOD update. This system is decently fast (upwards of 25000000 unlit/untextured tris per sec on my GF4 4400). However, this system was formed under some assumed limitations which I now know do not exist, namely: that I couldn't just use the same indices for all meshes since there was no way to specifiy an offset into a VBO - wrong! (on a side note this also lets me use unsigned shorts instead of integer indices since what was limiting me was the fact that I had a lot of vertices and needed to directly index to them all) that I couldn't map pointers to VBOs for easy updating, which kept me from trying to store indices in video memory along with vertex data. My new plan to make everything wonderful: *Keep all vertices in a static VBO as before *Change indices to unsigned shorts, use one VBO (STATIC_DRAW_ARB) to describe the static set of indices at each LOD for a mesh, and another (DYNAMIC_DRAW_ARB) which describes the current (at correct lod, and crack-fixed) indices for each mesh. *Change crack healing routine to something like as follows: get a read-only mapped pointer from the static index data and a write-only mapped pointer to the current index data, then proceed to pretty much fix them as I did before, for each mesh, after copying the correct indices in, if the LOD has changed. Something I am a little concerned about is that while I am getting my indices into video memory and they get to take up less space (and I don't need to add the mesh offset to the indices as I copy them, but that cost is almost nothing), I now need to rebind the vertex pointer for each mesh, instead of once before drawing all of them. Should this concern me? It still seems the advantages to my new system far outweigh this small fact. (Hey look, I made it back to my original topic! [smile]) I guess what I'm looking for is a sanity check, as you guys obviously know a lot more than I do about this stuff... Is there something crucial I've overlooked/won't be able to do? I am just plain being silly about something? Thanks for the help, I really appreciate it. If something isn't clear, just lemme know and I'll elucidate [smile] It's kinda hard trying to keep it short and explain everything at the same time.. thanks again! darren fitzpatrick [Edited by - darrenf on August 26, 2004 2:18:33 PM]

Share this post


Link to post
Share on other sites
Ok, wow, I think I just answered my performance difference question. I changed the dynamic offset adding code to simply rebind the vertex buffer with the correct offset (none of the other proposed changes implimented) and the tris/sec jumped to over 35 million! So I guess that the cost of dynamically adding the mesh offset to each index wasn't exactly "almost nothing" as I had thought [oh]

I'd still really appreciate any input on my design choices, especially if you have a better/faster/easier way of doing something - or even if it is just to let me know you've done something similar and it worked...

back to the code!

Share this post


Link to post
Share on other sites
Hmm.. Ok, this seems interesting.

*Change indices to no longer require the addition of an offset before use by instead specifying the offset when binding the VBO - works well, definite speed boost.

*Change indices to unsigned shorts rather than unsigned ints - works fine, obviously decreases memory usage, no real noticable performance difference.

However, when all I do is change this stuff:
*create a VBO and load the (unsigned short) indices into it. these are the indices which describe how to build one mesh at each LOD. this means there is one small group of them for all meshes (i'm not to the stage where i'll be copying the static indices over and patching for each mesh - ah yes, and I realized an alternative to that was to create a set of all possible patched vertices, i think that'd be better..)
*when binding the vertex VBO, I also bind the index VBO using the GL_ELEMENT_ARRAY_BUFFER_ARB target.
*change the glDrawElements offset passed to be relative to the VBO base address (i'm drawing multiple strips and not using glMultiDrawElements yet, so i have to pass offsets to each strip)

it KILLS my framerate! I go from chugging along in the 100's of fps to around 10 - 15! I thought, maybe it was the non word-aligned indices, but performance was fine using unsigned short indices until I tried moving the indices into a VBO and accessing them through it.. !? (I changed it back to unsigned ints just to be sure - didn't help.) Any insight? I'm kinda new to the whole indexed primitive thing in general, am I overlooking something basic? Is the GL state still not ready for my VBO'd indices without some other initial setup, and that's what's causing it to choke??

any help would be much appreciated

-darren

Share this post


Link to post
Share on other sites
Quote:
can ya throw the setup code and code you use to draw (the relivent bits only)

here? it might well help....


why, certainly :D

this function sets up all my VBOs. please excuse the ass backwards setup for the index buffer, i'm in the middle of changing the way that stuff works and my mesh index pointer setup isn't condusive to uploading to a VBO, hence the copying of the indices for each LOD into a single temporary buffer. again, these are the static set of indices which can be used for any mesh given the correct offset.


void HeightMap::BuildVBOs()
{
if( !m_VBOSupported )
return;

glGenBuffersARB( 1, &m_VBOVertID );
glBindBufferARB( GL_ARRAY_BUFFER_ARB, m_VBOVertID );
glBufferDataARB( GL_ARRAY_BUFFER_ARB, m_DataSizeX*m_DataSizeY*3*sizeof(float), m_pVertices, GL_STATIC_DRAW_ARB );

unsigned short * pTempBuff = new unsigned short[HMAP_TOTAL_NINDICES];
unsigned int lod, idx;
int buffidx = -1;
for( lod = 0; lod < HMAP_NLOD; lod++ )
for( idx = 0; idx < NVerts[lod]; idx++ )
pTempBuff[++buffidx] = HeightMesh::s_ppIndices[lod][idx];
glGenBuffersARB( 1, &m_VBOIndexID);
glBindBufferARB( GL_ARRAY_BUFFER_ARB, m_VBOIndexID );
glBufferDataARB( GL_ARRAY_BUFFER_ARB, HMAP_TOTAL_NINDICES*sizeof(unsigned short), pTempBuff, GL_STATIC_DRAW_ARB );
delete pTempBuff;
}



this function handles state setup then enters the recursive render loop which draws all visible meshes.


int HeightMap::RenderTerrain() const
{
glEnableClientState( GL_VERTEX_ARRAY );

long nTri = 0;
glBindBufferARB( GL_ARRAY_BUFFER_ARB, m_VBOVertID ); //this method works
// glBindBufferARB( GL_ELEMENT_ARRAY_BUFFER_ARB, m_VBOIndexID ); //this method is slow as hell

nTri = RecursivelyRenderTerrain( m_pMeshTree->GetRoot() );

glDisableClientState( GL_VERTEX_ARRAY );

return nTri;
}



k, here's the meat of it. pretty straightforward, if i'm on a leaf node, then specify the correct VBO offset then loop through each triangle strip for the current mesh LOD and render away! (as i mentioned, i'll be changing this to a glMultiDrawElementsEXT once this stuff is worked out..)


inline int HeightMap::RecursivelyRenderTerrain( const QuadTree<HeightMesh *>::QuadNode * pCurrentNode ) const
{
if( pCurrentNode->Visible() == 0 )
return 0;

if( pCurrentNode->Child(0) )
return RecursivelyRenderTerrain( pCurrentNode->Child(0) ) +
RecursivelyRenderTerrain( pCurrentNode->Child(1) ) +
RecursivelyRenderTerrain( pCurrentNode->Child(2) ) +
RecursivelyRenderTerrain( pCurrentNode->Child(3) );

HeightMesh * pMesh = pCurrentNode->Object(0);
int lod = pMesh->m_CurrentLOD;
unsigned int idx;
int nTri = (NVertsPerStrip[lod] - 2) * NStrips[lod];

glVertexPointer( 3, GL_FLOAT, 0, (char *)NULL + pMesh->m_ByteOffset );
for(idx = 0; idx < NStrips[lod]; idx++)
{
glDrawElements( GL_TRIANGLE_STRIP, NVertsPerStrip[lod], GL_UNSIGNED_SHORT, //this works fine
pMesh->m_pCurrentIndices + NVertsPerStrip[lod]*idx );
//glDrawElements( GL_TRIANGLE_STRIP, NVertsPerStrip[lod], GL_UNSIGNED_SHORT, //slow as all hell
// (char *)NULL + NVertsPerStrip[lod]*idx*sizeof(unsigned short) + LODOffset[lod]*sizeof(unsigned short));
}

return nTri;
}



the only thing i can possibly think of, is that for some strange reason reusing the same small index buffer for each mesh is what is causing my problems? the non-commented code is sourcing indices from the current, patched indices which are unique to each mesh. the commented (slow) code reuses the same indices for each mesh and doesn't worry about the cracks for now :P

thanks!
-darren

edit: getting the hang of source tags

Share this post


Link to post
Share on other sites
well, (of course) another possible solution came to me while posting. although STATIC_DRAW_ARB seemed the obvious choice for usage hint for the index buffer, that was it! pretty much any of the DYNAMIC_*_ARB flags perform much better, the best choice seeming to be DYNAMIC_READ_ARB. this puzzles me a little. either i don't understand the usage hints correctly, or i have stumbled upon a case where they are extremely ineffective at choosing a good memory chunk to return.

anyone care to lend any insight? using DYNAMIC_COPY_ARB, performance is only slightly worse (difference between 190 fps and 186 fps), but i expected a significant performance gain from VBO'ing my indices, which this has obviously not provided... any ideas?


-darren f

Share this post


Link to post
Share on other sites
hmmm the first thing which jumps out at me is that you construct the VBO with GL_ARRAY_BUFFER_ARB yet bind it again later for useage with GL_ELEMENT_ARRAY_BUFFER_ARB. This is only a shot in the dark but I am wondering if that could be part of the problem.

Try changing it to that and giving it another go with the static flags, at the very least it would be worth seeing if it does make a difference.

There is the outside change there is a bug in the VBO implimentation in your driver set, not having an NV card I wouldnt know however, but thats only something to keep in mind as a possible issue.

Share this post


Link to post
Share on other sites
phantom - thanks! that was it, good guess! i somehow failed to realize that i should be using the GL_ELEMENT_ARRAY_BUFFER_ARB target both when creating and binding the buffer. the GL_STATIC_DRAW_ARB flag works fine with no performance problems!

again, thanks a lot!
-darren

yay this means soon i start on my regcom setup :D

Share this post


Link to post
Share on other sites
hmm.. looks like one more quick question [smile]

does anyone know of a trick to use VBOs of indices with glMultiDrawElementsEXT ? i thought the transition from a for loop + glDrawElements to a single glMultiDrawElementsEXT would be trivial, but i forgot one important detail...

you have to give MultiDrawElements an array of pointers to index arrays, meaning there's gotta be a chunk of memory which points to all the other chunks containing the index arrays, meaning none of these pointers are going to be valid if i try and move the indices into a VBO? is there some sneaky trick to get around this? am i confused on how the array of pointers to indice arrays works?

beginning to think it's gonna stay as a looped glDrawElements... [wink]

**edit** could it be that when an index array VBO is bound to ELEMENT_ARRAY_BUFFER, this array is treated as an array of offsets into the bound index VBO? hm either way it's starting to seem like leaving it alone is way to go heh.

-darren

Share this post


Link to post
Share on other sites
*edit* well, according to this paper from nvidia, binding a VBO changes ALL functions which accept pointers to arguements to interpret these as offsets. so my hunch was correct, i could do it, if i really really wanted to..

yeah, i'm over it. pretty sure it can't be done, or it would require some really funky pointer conversions/mapping or something.. either way, too much effort just to make the driver loop for me :D

thanks for the help!

[Edited by - darrenf on August 27, 2004 2:45:51 PM]

Share this post


Link to post
Share on other sites

This topic is 4856 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this