Why VBO so slow ?

Started by
24 comments, last by blizzard999 18 years, 8 months ago
Hi, I've used vertex arrays (VA) to render my meshes; I load them as 3ds models (I use l3ds): everything is ok (vertices, normals, texture,...) Today I had the 'brillant' idea: why not use vertex buffer objects (VBO)? I've implemented a simple class to manage VBO so I can switch between VA and VBO quite easily; if the renderer found VBO initialized use them otherwise use VA. The first execution was a crash (I did not binded before call glVertexPointer :) Now everything works (read more>>>) In my mesh I use common structures like these

class MeshVertex{ public:
 GL::Vector3f vertex;
 GL::Vector3f normal;
 GL::Vector2f texcoord;
 GL::Vector2f texcoord1;
}
and

struct Triangle{
 GLuint a;
 GLuint b;
 GLuint c;
}
In other words I need to 'interleave' my arrays. I learned how to do this (NVidia has posted a lot of nice tutorials)(simply specify stride and offset as in vertex arrays) But this is not the question because everything work also with VBO (no crash, no gl errors, proper light, texture,...). The problem is that this approach is TOO SLOW!!! Not slow : STATIC!!! :) I cannot quantify exactly but my approximated FPS goes from 100fps for a certain model to 0!!! (in other situations it is the half than VA) Now I'm using an 'old' GE Force MX440 with XP (I've not tested yet with a slight better FX family that is on a different system); I know that this is not an 'hard core graphic card' but I suppose it should be something wrong in what I've done. I use BufferData only when my (static) model change (loading, modify) then I use only BindBuffer (so I dont pass geometry anymore). I tried to use both STATIC_DRAW and DYNAMIC_DRAW (with the same bad results!!!). I also use glDrawRangeElements() and in my VBO managment class I dont bind a buffer if it is currently in use. In other words I've tried every optimization to avoid the problem I have! Have you experienced the same problem? Can you suggest a solution ? ANY IDEA ??? I've read a lot of tutorials in which there is no use of interleaving...is this the problem? (NV spec say that this does not affect performance and in facti it works slow but properly) Thanks for any suggestion !!! Ciao
Advertisement
show use the setup and drawing code
Quote:Original post by phantom
show use the setup and drawing code


ok..I'll try to 'isolate' the incriminated code to not abuse of the forum space and/or your patience! :)

Note that the code I'm posting 'works' so the problem can be in
the mechanism or in the implementation (drivers?)

This is the simple VBO managment class (italian comment but it should be simple
to understand)(in the .cpp file I only define the static variable
m_currentBufferID)

#ifndef AC_GL_VBO#define AC_GL_VBO#include "lib/glee/glee.h"#include <gl/gl.h>#include "ac_types.h"namespace AC{ namespace GL{class VertexBufferObject{public:	/// @brief void ctor	VertexBufferObject(void){		m_BufferID=0;	}		/// @brief Dealloca e distrugge	~VertexBufferObject(void){		Delete();	}	/// @brief Dealloca	void Delete(void){		if(m_BufferID){			if(m_currentBufferID==m_BufferID) 				m_currentBufferID=0;			glDeleteBuffers(1,&m_BufferID);			m_BufferID=0;		}	}	/// @brief Genera nuovo buffer	bool Gen(void){		Delete();		glGenBuffers(1,&m_BufferID);		return m_BufferID!=0;	}	/// @brief Verifica se il buffer esiste	bool IsBuffer(void)const{		return GetBuffer()!=0;	}		/// @brief Ritorna ID del buffer	GLuint GetBuffer(void)const{		return m_BufferID;	}	/// @brief BufferData	/// target :  	///			- ARRAY_BUFFER	///			- ELEMENT_ARRAY_BUFFER	/// usage  : 	///			- STREAM_DRAW	///			- STREAM_READ	///			- STREAM_COPY	///			- STATIC_DRAW	///			- STATIC_READ	///			- STATIC_COPY	///			- DYNAMIC_DRAW	///			- DYNAMIC_READ	///			- DYNAMIC_COPY	void Data(GLenum target, SIZET size, LPCVOID data, GLenum usage)const{		if(IsBuffer()){			Bind(target);			glBufferData(target,size,data,usage);		}	}	/// @brief BindBuffer	///	/// target può essere 	/// - ARRAY_BUFFER	/// - ELEMENT_ARRAY_BUFFER	///	void Bind(GLenum target)const{		if(m_BufferID&&(m_currentBufferID!=m_BufferID)){ 			glBindBuffer(target, m_BufferID);			m_currentBufferID=m_BufferID;		}	}	/// @brief Estensione supportata ?	static bool IsSupported(void){		return glGenBuffers   !=NULL &&			   glBufferData   !=NULL &&			   glDeleteBuffers!=NULL;	}private:	// id del buffer	GLuint m_BufferID;	static GLuint m_currentBufferID;	};}} // end of ns GL,AC#endif // ens of #ifndef AC_GL_VBO


This is the inner loop of the MeshObject 'object'
The MeshObject is a vector of Mesh

I also have a variable with the DrawMode needed (combination of
LIGHTING,TEXTURE,TEXTURE1,WIREFRAME,...)
I use also glee.h so I can use extension more easily...

The relevant functions in the code are
Mesh::DeclareVertexArray() and
Mesh::DrawElements()

void MeshObject::glset(void)const{	Material currentMaterial;	glPushAttrib(GL_COLOR_BUFFER_BIT|GL_TEXTURE_BIT); // save attrib	glPushClientAttrib(GL_CLIENT_VERTEX_ARRAY_BIT); // save client attrib	bool bFirstMesh = true;	for(MESHES::const_iterator mesh=m_Meshes.begin();	    mesh!=m_Meshes.end(); mesh++)	{		// setup material		if((m_DrawMode&LIGHTING) && 		   (bFirstMesh || currentMaterial!=mesh->m_Material)){				bFirstMesh=false;				mesh->m_Material.glset(GL_FRONT_AND_BACK);					}				glEnableClientState(GL_VERTEX_ARRAY);		mesh->DeclareVertexArray(); // I use the same code for VA and VBO						// this is for wireframe only and for wireframe 'overlay'		if(m_DrawMode&WIREFRAME){			glPushAttrib(GL_DEPTH_BUFFER_BIT | 				           GL_POLYGON_BIT      |						       GL_TEXTURE_BIT      |		               GL_LIGHTING_BIT     |						       GL_LINE_BIT);			glDisable(GL_LIGHTING);			glDisableClientState(GL_NORMAL_ARRAY);			glDisableClientState(GL_TEXTURE_COORD_ARRAY);			glDepthFunc(GL_LEQUAL);			glDisable(GL_TEXTURE_2D);			glPolygonMode( GL_FRONT_AND_BACK,GL_LINE);			mesh->m_colorWireframe.glset();			mesh->DrawElements();  // also in this case I use the same code!			glPopAttrib();		}		glEnableClientState(GL_NORMAL_ARRAY);		mesh->DeclareNormalArray();   		if(m_DrawMode&TEXTURE){	    // NOTE : this part contains the routine to set up the texture      // and multitexture with and witout lighting      // I removed it for brevity: if you need I can post it! 				}				// this is to force something to be drawn if everything is disabled!		bool bForce = !((m_DrawMode&WIREFRAME)|| 						       (m_DrawMode&SOLID)     || 						       (m_DrawMode&TEXTURE)   );		if(bForce || ((m_DrawMode&SOLID)||(m_DrawMode&TEXTURE))){				mesh->m_colorSolid.glset();				mesh->DrawElements();			}		}// altra mesh	glPopClientAttrib(); 	glPopAttrib();}


Now I post (they are very short) the code I use to setup VBOs

As I posted in my first post I have these structures in the single Mesh object

Vector3f/2f are vector of 3 and 2 floats I use to store GL data with limited
functions (I use them as storage structure)

        /// @brief Vertice	struct MeshVertex{		Vector3f   vertex;		Vector3f   normal;		Vector2f   texcoord;		Vector2f   texcoord1;	};	/// @brief Vertici	typedef std::vector<MeshVertex> VERTICES;	/// @brief Vertici	VERTICES	 m_Vertices;	/// @brief Triangolo 	///	/// I suoi vertici sono indici relativi ai vertici globali	/// The indices refer to  m_Vertices        struct Triangle{		GLuint a;		GLuint b;		GLuint c;	};		/// @brief Triangoli	typedef std::vector<Triangle>   TRIANGLES;	/// @brief Triangoli	TRIANGLES    m_Triangles;	


This is the function used to generate VBOs and it is called by an extern
function every time the object is changed (few times at the beginning)
usage is a param that can be as you know GL_STATIC_DRAW or GL_DYNAMIC_DRAW
vboVertices and vboElements are VertexBufferObject (see above)


// i generate VBO for every mesh in the vectorvoid MeshObject::GenerateVBO(GLenum usage){	for(MESHES::iterator mesh=m_Meshes.begin();mesh!=m_Meshes.end();mesh++)		mesh->GenerateVBO(usage);}// generate VBO for the single meshvoid Mesh::GenerateVBO(GLenum usage){	if(!VertexBufferObject::IsSupported()) return; // non supportato!	m_vboVertices.Gen();	m_vboElements.Gen();	m_vboVertices.Data(GL_ARRAY_BUFFER,		           m_Vertices.size()*sizeof(Mesh::MeshVertex),		           m_Vertices.begin(),			   usage);	m_vboElements.Data(GL_ELEMENT_ARRAY_BUFFER,			   m_Triangles.size()*sizeof(Mesh::Triangle),			   m_Triangles.begin(),			   usage);}


Now the tricky part : how I can use the same code with VA and VBO
If I call MeshObject::GenerateVBO() I will use VBO else VA

Note the use of BUFFER_OFFSET to resolve the interleaving: it works!

// if i previously called GenerateVBO() this function will return truebool Mesh::VBOEnabled(void)const{   return m_vboVertices.IsBuffer()&&m_vboElements.IsBuffer();}	// macro to define 'starting' offset in VBOs// NOTE: the forum script hide the + plus sign!!! #define BUFFER_OFFSET(i) ((char*)NULL plus (i))void Mesh::DeclareVertexArray(void)const{	if(!VBOEnabled())	  glVertexPointer(3,GL_FLOAT,sizeof(MeshVertex),&(m_Vertices[0].vertex));	else{ 		m_vboVertices.Bind(GL_ARRAY_BUFFER); // if it is in use this is a no op		glVertexPointer(3,GL_FLOAT,sizeof(MeshVertex),BUFFER_OFFSET(0));		}}void Mesh::DeclareNormalArray(void)const{	if(!VBOEnabled())	  glNormalPointer(GL_FLOAT,sizeof(MeshVertex),&(m_Vertices[0].normal));	else{ 		m_vboVertices.Bind(GL_ARRAY_BUFFER);		glNormalPointer(GL_FLOAT,sizeof(MeshVertex),	       BUFFER_OFFSET((const char*)&(m_Vertices[0].normal)-		                 (const char*)&(m_Vertices[0].vertex)));	}}// very similarvoid Mesh::DeclareTextureArray(void)const{...}void Mesh::DeclareTextureArray1(void)const{...}


I know that this is a lot of code (in fact I tried to avoid to post it :)
However it may be useful for me to understand the problem or useful to someone
to implement similar (and better) classes.
I remember that the problem is that the code works with VA and VBO but if I use VBO it is very very slow.
Are you using 32-bit indices (GLuint)? Try 16-bit instead, if you can. Some/most hardware falls back to a particularly slow mode when using 32-bit (I think it involves reading the 32-bit numbers back from the GPU), so it's actually slower than just keeping the index buffers in RAM.
Quote:Original post by Fingers_
Are you using 32-bit indices (GLuint)? Try 16-bit instead, if you can. Some/most hardware falls back to a particularly slow mode when using 32-bit (I think it involves reading the 32-bit numbers back from the GPU), so it's actually slower than just keeping the index buffers in RAM.


This is a good suggestion...I've read something about this on NVIDIA site. However using glDrawRangeElements should reduce this problem (and casting a 32 bit integer to a 16 bit integer is the fastest operation I can think about :)

To be sure this is not the problem I added this lines to my gl_types.h header file

//#define GL_LONG_INDEX#ifdef GL_LONG_INDEX	#define GL_INDEX              GL_UNSIGNED_INT	typedef GLuint                GLindex;#else	#define GL_INDEX              GL_UNSIGNED_SHORT	typedef GLushort              GLindex;#endif


So I can switch between short and int indices...(if I need an index type I declare it as GLindex and/or use the corresponding enum GL_INDEX)
I've seen no difference with short instead of int!!! :(

The driver I'm using should be the 61.76 (july 2004); it's quite updated I know but the latest version gave me some problems when installed so I mantained this one : but VBO is a 'very old' approved extension and MX440 is not so a new graphic card!
One solution can be to deinterleave my arrays and use separate 'channels' for vertices, normals, textures (the second is throw this evil computer out of the window!!! :)

I suppose that NVIDIA drivers virtually supports every form of extensions but in practice they are implemented with some sort of bad and non optimized code (I listen something about earliest versions of T&L some years ago).

In this special case (interleaving?) I can deduce not only that VBO is not fast but It's slower and slower respect to vertex arrays and display lists (va related)
In my opinion disp list are the best solution if there is no special system memory limit! Always work, always simple, no need to rearrange data, no extension,...good.

I cannot imagine what sort of transforming the driver perform on the VBO data I pass but I'm sure it is not a simple gl[...]Pointer() wrapping!!! Why??? This solution would be faster!!!

However I'm waiting for an OpenGL guru that say me : "you have understood nothing about vbo! try this!" :)

Thanks



Interleaving your data isn't really a "special case", and in fact it's quite possibly the preffered way of doing things. (See this thread)

The one thing that immediately jumps to mind here (since I can't see anything obviously wrong with your code) is that your buffer may potentially be to large. There was a post by YannL awhile back where he said that VBO's should be kept under 8 megs, which seems to be the magical "preformance wall" for most cards. Now that's a hell of a lot of data for a single buffer (33554432 individual verts, if you're doing it position/normal/uv ) but I have seen people allocate bigger... With your card, though, I may suggest keeping the buffer size even smaller.

Now a quick question: When was the last time you updated your drivers? With a card that old, and your description of a complete preformance crash, it sounds very much like you may be hitting some feature that your card is attempting to implement in a "software-like" mode. How, exactly, such a hardware oriented feature like VBO's would be software emulated I'm not sure, but I suppose it could happen. In any case, if you haven't for a while try a driver upgrade.

One final little tip that you may or may not already know is that Nvidia cards tend to do all the heavy lifting for VBO's in the call to glVertexPointer(), so you want to set all other pointers before that one, and if you have multiple models in a single buffer try to only call it once per buffer.

Hope those help in some way! Good luck!
// The user formerly known as Tojiro67445, formerly known as Toji [smile]
Thanks very much Toji for your suggestions!

Quote:Original post by Toji
Interleaving your data isn't really a "special case", and in fact it's quite possibly the preffered way of doing things. (See this thread)

I think so and the code is more elegant and less error prone.

Quote:
The one thing that immediately jumps to mind here (since I can't see anything obviously wrong with your code) is that your buffer may potentially be to large. There was a post by YannL awhile back where he said that VBO's should be kept under 8 megs, which seems to be the magical "preformance wall" for most cards. Now that's a hell of a lot of data for a single buffer (33554432 individual verts, if you're doing it position/normal/uv ) but I have seen people allocate bigger... With your card, though, I may suggest keeping the buffer size even smaller.

I tried with different models...I dont remember the size but for sure they are less than 8 MB. I use these models for a cloth simulation so I dont use large models...I noticed the same problem with a simple model too (about 100-200 polys max)

Quote:
Now a quick question: When was the last time you updated your drivers? With a card that old, and your description of a complete preformance crash, it sounds very much like you may be hitting some feature that your card is attempting to implement in a "software-like" mode. How, exactly, such a hardware oriented feature like VBO's would be software emulated I'm not sure, but I suppose it could happen. In any case, if you haven't for a while try a driver upgrade.

I tried today. The current version is one year old because I've tried the latest and it does not work with XP (why? black screen at start). So I'm moving from version to version...when it works I mantain it :).
However we are talking about an extension that in july 2004 was already supported (after multitexturing and point sprite I think that VBO are the oldest ext!!).
I agree with you: it should be a bad (very bad) software emulation. I suppose that the driver copies the data somewhere as it is and then deinterleave it every time! Or it calls a sleep function to waste my time :)

Quote:
One final little tip that you may or may not already know is that Nvidia cards tend to do all the heavy lifting for VBO's in the call to glVertexPointer(), so you want to set all other pointers before that one, and if you have multiple models in a single buffer try to only call it once per buffer.


I did not know this! I can try! I 'declare' vertex pointer each rendering frame for each mesh (in practice once) but are you saying that declare Normals before Vertex can be different? Right? I also call Bind before every Pointer declaration but as you can see in the first class I posted I use a static variable to avoid multiple bind of the same buffer.

Storing vertices from different meshes into one single buffer (and mantain only triangles/indices in each single mesh) can be more efficient...good idea...
However if I cannot reach the speed of the VA (declared in the same way!) I dont see other solutions than disable the extension and/OR try it with the FX (that I've not here ).

Quote:
Hope those help in some way! Good luck!


Thanks again Toji
Evil card!!!

I just downloaded NeHe Lesson #45 from the legendary site.

The mesh esplodes!!! Half of the triangles are stripped everywhere in the space (like corrupted data); if I recompile with VBO off the mesh is ok.
The strange things are
- NeHe code does not work on my MX440 but it goes at the same speed of VA (it uses DrawArrays directly) VBO goes slightly slower because the 'explosion' create more fillrate.

- my code works perfectly but it is very slow (I use DrawRangeElements)

I'm very disappointed because I'm not using the latest shading program (obviously not supported by the MX card) but I'm using a very old and approved extension on XP os.
I use also my card for less serious but intensive application (game?? :) ) and I never seen these performance crashes so I think that the problem is the great unified driver developed by NVIDIA.
In my attempt to search for a working driver I installed the 66.93 (November 2004) . The previous one (july) give me no result, with this one I've the same FPS of VA so the problem should be solved (probably it's a fill rate bottleneck now).
However I dont see any advantage in using VBO with this card!
NeHe code continues to have problems.
Thanks

This topic is closed to new replies.

Advertisement