# On indexed VBO performance...

This topic is 5163 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

##### Share on other sites
Ok, wow, I think I just answered my performance difference question. I changed the dynamic offset adding code to simply rebind the vertex buffer with the correct offset (none of the other proposed changes implimented) and the tris/sec jumped to over 35 million! So I guess that the cost of dynamically adding the mesh offset to each index wasn't exactly "almost nothing" as I had thought [oh]

I'd still really appreciate any input on my design choices, especially if you have a better/faster/easier way of doing something - or even if it is just to let me know you've done something similar and it worked...

back to the code!

##### Share on other sites
Hmm.. Ok, this seems interesting.

*Change indices to no longer require the addition of an offset before use by instead specifying the offset when binding the VBO - works well, definite speed boost.

*Change indices to unsigned shorts rather than unsigned ints - works fine, obviously decreases memory usage, no real noticable performance difference.

However, when all I do is change this stuff:
*create a VBO and load the (unsigned short) indices into it. these are the indices which describe how to build one mesh at each LOD. this means there is one small group of them for all meshes (i'm not to the stage where i'll be copying the static indices over and patching for each mesh - ah yes, and I realized an alternative to that was to create a set of all possible patched vertices, i think that'd be better..)
*when binding the vertex VBO, I also bind the index VBO using the GL_ELEMENT_ARRAY_BUFFER_ARB target.
*change the glDrawElements offset passed to be relative to the VBO base address (i'm drawing multiple strips and not using glMultiDrawElements yet, so i have to pass offsets to each strip)

it KILLS my framerate! I go from chugging along in the 100's of fps to around 10 - 15! I thought, maybe it was the non word-aligned indices, but performance was fine using unsigned short indices until I tried moving the indices into a VBO and accessing them through it.. !? (I changed it back to unsigned ints just to be sure - didn't help.) Any insight? I'm kinda new to the whole indexed primitive thing in general, am I overlooking something basic? Is the GL state still not ready for my VBO'd indices without some other initial setup, and that's what's causing it to choke??

any help would be much appreciated

-darren

##### Share on other sites
can ya throw the setup code and code you use to draw (the relivent bits only) here? it might well help....

##### Share on other sites
Quote:
 can ya throw the setup code and code you use to draw (the relivent bits only) here? it might well help....

why, certainly :D

this function sets up all my VBOs. please excuse the ass backwards setup for the index buffer, i'm in the middle of changing the way that stuff works and my mesh index pointer setup isn't condusive to uploading to a VBO, hence the copying of the indices for each LOD into a single temporary buffer. again, these are the static set of indices which can be used for any mesh given the correct offset.

void HeightMap::BuildVBOs(){	if( !m_VBOSupported )		return;	glGenBuffersARB( 1, &m_VBOVertID );	glBindBufferARB( GL_ARRAY_BUFFER_ARB, m_VBOVertID );	glBufferDataARB( GL_ARRAY_BUFFER_ARB, m_DataSizeX*m_DataSizeY*3*sizeof(float), m_pVertices, GL_STATIC_DRAW_ARB );	unsigned short * pTempBuff = new unsigned short[HMAP_TOTAL_NINDICES];	unsigned int lod, idx;	int buffidx = -1;	for( lod = 0; lod < HMAP_NLOD; lod++ )		for( idx = 0; idx < NVerts[lod]; idx++ )			pTempBuff[++buffidx] = HeightMesh::s_ppIndices[lod][idx];	glGenBuffersARB( 1, &m_VBOIndexID);	glBindBufferARB( GL_ARRAY_BUFFER_ARB, m_VBOIndexID );	glBufferDataARB( GL_ARRAY_BUFFER_ARB, HMAP_TOTAL_NINDICES*sizeof(unsigned short), pTempBuff, GL_STATIC_DRAW_ARB );	delete pTempBuff;}

this function handles state setup then enters the recursive render loop which draws all visible meshes.

int HeightMap::RenderTerrain() const{	glEnableClientState( GL_VERTEX_ARRAY );		long nTri = 0;	glBindBufferARB( GL_ARRAY_BUFFER_ARB, m_VBOVertID );				//this method works	//	glBindBufferARB( GL_ELEMENT_ARRAY_BUFFER_ARB, m_VBOIndexID );	//this method is slow as hell	nTri = RecursivelyRenderTerrain( m_pMeshTree->GetRoot() );	glDisableClientState( GL_VERTEX_ARRAY );	return nTri;}

k, here's the meat of it. pretty straightforward, if i'm on a leaf node, then specify the correct VBO offset then loop through each triangle strip for the current mesh LOD and render away! (as i mentioned, i'll be changing this to a glMultiDrawElementsEXT once this stuff is worked out..)

inline int HeightMap::RecursivelyRenderTerrain( const QuadTree<HeightMesh *>::QuadNode * pCurrentNode ) const{	if( pCurrentNode->Visible() == 0 )		return 0;	if( pCurrentNode->Child(0) )		return	RecursivelyRenderTerrain( pCurrentNode->Child(0) ) +				RecursivelyRenderTerrain( pCurrentNode->Child(1) ) +				RecursivelyRenderTerrain( pCurrentNode->Child(2) ) +				RecursivelyRenderTerrain( pCurrentNode->Child(3) );		HeightMesh * pMesh = pCurrentNode->Object(0);	int lod = pMesh->m_CurrentLOD;	unsigned int idx;	int nTri = (NVertsPerStrip[lod] - 2) * NStrips[lod];		glVertexPointer( 3, GL_FLOAT, 0, (char *)NULL + pMesh->m_ByteOffset );		for(idx = 0; idx < NStrips[lod]; idx++)	{		glDrawElements( GL_TRIANGLE_STRIP, NVertsPerStrip[lod], GL_UNSIGNED_SHORT,		//this works fine						pMesh->m_pCurrentIndices + NVertsPerStrip[lod]*idx );		//glDrawElements( GL_TRIANGLE_STRIP, NVertsPerStrip[lod], GL_UNSIGNED_SHORT,	//slow as all hell		//				(char *)NULL + NVertsPerStrip[lod]*idx*sizeof(unsigned short) + LODOffset[lod]*sizeof(unsigned short));	}		return nTri;}

the only thing i can possibly think of, is that for some strange reason reusing the same small index buffer for each mesh is what is causing my problems? the non-commented code is sourcing indices from the current, patched indices which are unique to each mesh. the commented (slow) code reuses the same indices for each mesh and doesn't worry about the cracks for now :P

thanks!
-darren

edit: getting the hang of source tags

##### Share on other sites
well, (of course) another possible solution came to me while posting. although STATIC_DRAW_ARB seemed the obvious choice for usage hint for the index buffer, that was it! pretty much any of the DYNAMIC_*_ARB flags perform much better, the best choice seeming to be DYNAMIC_READ_ARB. this puzzles me a little. either i don't understand the usage hints correctly, or i have stumbled upon a case where they are extremely ineffective at choosing a good memory chunk to return.

anyone care to lend any insight? using DYNAMIC_COPY_ARB, performance is only slightly worse (difference between 190 fps and 186 fps), but i expected a significant performance gain from VBO'ing my indices, which this has obviously not provided... any ideas?

-darren f

##### Share on other sites
hmmm the first thing which jumps out at me is that you construct the VBO with GL_ARRAY_BUFFER_ARB yet bind it again later for useage with GL_ELEMENT_ARRAY_BUFFER_ARB. This is only a shot in the dark but I am wondering if that could be part of the problem.

Try changing it to that and giving it another go with the static flags, at the very least it would be worth seeing if it does make a difference.

There is the outside change there is a bug in the VBO implimentation in your driver set, not having an NV card I wouldnt know however, but thats only something to keep in mind as a possible issue.

##### Share on other sites
phantom - thanks! that was it, good guess! i somehow failed to realize that i should be using the GL_ELEMENT_ARRAY_BUFFER_ARB target both when creating and binding the buffer. the GL_STATIC_DRAW_ARB flag works fine with no performance problems!

again, thanks a lot!
-darren

yay this means soon i start on my regcom setup :D

##### Share on other sites
hmm.. looks like one more quick question [smile]

does anyone know of a trick to use VBOs of indices with glMultiDrawElementsEXT ? i thought the transition from a for loop + glDrawElements to a single glMultiDrawElementsEXT would be trivial, but i forgot one important detail...

you have to give MultiDrawElements an array of pointers to index arrays, meaning there's gotta be a chunk of memory which points to all the other chunks containing the index arrays, meaning none of these pointers are going to be valid if i try and move the indices into a VBO? is there some sneaky trick to get around this? am i confused on how the array of pointers to indice arrays works?

beginning to think it's gonna stay as a looped glDrawElements... [wink]

**edit** could it be that when an index array VBO is bound to ELEMENT_ARRAY_BUFFER, this array is treated as an array of offsets into the bound index VBO? hm either way it's starting to seem like leaving it alone is way to go heh.

-darren

##### Share on other sites
hmmm i cant think of a way off the top of my head how it could be done, the fact that binding a VBO as the index array overrides a normal indices array probably prevents it from working

1. 1
2. 2
Rutin
18
3. 3
4. 4
5. 5

• 13
• 14
• 9
• 9
• 9
• ### Forum Statistics

• Total Topics
632925
• Total Posts
3009237
• ### Who's Online (See full list)

There are no registered users currently online

×