#### Archived

This topic is now archived and is closed to further replies.

# Unusual VBO slowdown

This topic is 5389 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I implemented vertex arrays and VBO in my planet generator experiment... To my great surprise, VBO is performing much slower than regular vertex array, and even slower than immediate mode. The polygons are divided into 20 batches (there are 20 unique textures), so the batch size in the following data varies from 1024 to 65536 triangles.
(Mtris/second with immediate mode/regular vertex array/VBO)
20480 tris:   2.56 / 4.09 / 1.86
81920 tris:   2.64 / 13.7 / 1.95
327680 tris:  2.69 / 2.73 / 2.05
1310720 tris: 2.65 / 2.76 / 0.1 (out of VRAM?)

Setup: P4/2.4Ghz, Radeon 9700 pro 128M

Things to note are that the vertex array performance has a sharp peak at 4096 tris/batch, but VBO is consistently slow. Can you see anything obviously wrong in the code below? (this is called once per frame for each quadrant [batch], the arrays are written to only once in an init function)
ifdef USE_VAR
glEnableClientState(GL_VERTEX_ARRAY);
glEnable(GL_TEXTURE_COORD_ARRAY);
glEnableClientState(GL_NORMAL_ARRAY);

if (gfx.vbosup)
{
glBindBufferARB( GL_ARRAY_BUFFER_ARB, orb->rants[r].nvar);
glVertexPointer(3, GL_INT, 0, NULL);
glBindBufferARB( GL_ARRAY_BUFFER_ARB, orb->rants[r].nnar);
glNormalPointer(GL_FLOAT, 0, NULL);
glBindBufferARB( GL_ARRAY_BUFFER_ARB, orb->rants[r].ntar);
glTexCoordPointer(2, GL_FLOAT, 0, NULL);
}
else
{
glVertexPointer(3, GL_INT, 0, orb->rants[r].var);
glNormalPointer(GL_FLOAT, 0, orb->rants[r].nar);
glTexCoordPointer(2, GL_FLOAT, 0, orb->rants[r].tar);
}

glDrawElements(GL_TRIANGLES, orb->rants[r].ntris*3, GL_UNSIGNED_INT, orb->rants[r].iar); // array of vertex/normal/texcoord indices

glDisable(GL_TEXTURE_COORD_ARRAY);
glDisableClientState(GL_NORMAL_ARRAY);
glDisableClientState(GL_VERTEX_ARRAY);
#else
glBegin(GL_TRIANGLES);
tris = orb->rants[r].tris;
verts = orb->rants[r].verts;
for (tri = 0; tri < orb->rants[r].ntris; tri++)
{
//glNormal3f(tris[tri].norm[0], tris[tri].norm[1], tris[tri].norm[2]);
glNormal3f(orb->rants[r].nar[tris[tri].v[0]*3+0], orb->rants[r].nar[tris[tri].v[0]*3+1], orb->rants[r].nar[tris[tri].v[0]*3+2]);
glTexCoord2f(orb->rants[r].tar[tris[tri].v[0]*2+0], orb->rants[r].tar[tris[tri].v[0]*2+1]);
glVertex3i(orb->rants[r].var[tris[tri].v[0]*3+0], orb->rants[r].var[tris[tri].v[0]*3+1], orb->rants[r].var[tris[tri].v[0]*3+2]);

glNormal3f(orb->rants[r].nar[tris[tri].v[1]*3+0], orb->rants[r].nar[tris[tri].v[1]*3+1], orb->rants[r].nar[tris[tri].v[1]*3+2]);
glTexCoord2f(orb->rants[r].tar[tris[tri].v[1]*2+0], orb->rants[r].tar[tris[tri].v[1]*2+1]);
glVertex3i(orb->rants[r].var[tris[tri].v[1]*3+0], orb->rants[r].var[tris[tri].v[1]*3+1], orb->rants[r].var[tris[tri].v[1]*3+2]);

glNormal3f(orb->rants[r].nar[tris[tri].v[2]*3+0], orb->rants[r].nar[tris[tri].v[2]*3+1], orb->rants[r].nar[tris[tri].v[2]*3+2]);
glTexCoord2f(orb->rants[r].tar[tris[tri].v[2]*2+0], orb->rants[r].tar[tris[tri].v[2]*2+1]);
glVertex3i(orb->rants[r].var[tris[tri].v[2]*3+0], orb->rants[r].var[tris[tri].v[2]*3+1], orb->rants[r].var[tris[tri].v[2]*3+2]);
pc++;
}
glEnd();
#endif


##### Share on other sites
it seems like you need to access three different vertex buffers for one vertex. i dont know if thats regular behaviour, but when i tried to place my texcoords somewhere else (ie the moment i was accessing more than one vb at once) the performance degraded horribly. in other words: dont. allocate one big buffer and either use offsets for the different kinds of data or (probably better) store them interleaved.

looking something like either this:
glBindBufferARB( GL_ARRAY_BUFFER_ARB, orb->rants[r].nvar);
glVertexPointer(3, GL_INT, 0, 0);
glNormalPointer(GL_FLOAT, 0, (char*)NormOffset);
glTexCoordPointer(2, GL_FLOAT, 0, (char*)TexOffset);

or: (with struct Vertex as {int x,y,z; float nx,ny,nz; float u,v;}
glVertexPointer(3, GL_INT, sizeof(Vertex), 0);
glNormalPointer(GL_FLOAT, sizeof(Vertex), (char*)(3*sizeof(int)));
glTexCoordPointer(2, GL_FLOAT, sizeof(Vertex), (char*)(3* (sizeof(int)+sizeof(float)) ));

the idea behind method 2 is that the closer you store your data the less it needs to wildly jump all over memory to collect it, though i wouldnt expect it to make much difference. just try keeping your whole stuff for one draw-call in one single vb.

[edited by - Trienco on November 16, 2003 7:39:52 AM]

##### Share on other sites
You should also use GL_ELEMENT_ARRAY_BUFFER_ARB to store indices in video mem. I know ATi prefers this over system-stored indices. Overall those numbers (tris/sec) are very low for that kind of card. You should be getting at least 5x more. The "out of VRAM" thing is probably becouse there is a 32mb limit on VBO size (not writen anywhere but both nVidia and ATi fail to allocate this size in VRAM)

You should never let your fears become the boundaries of your dreams.

##### Share on other sites
glVertexPointer(3, GL_INT , sizeof(Vertex), 0);
int aren''t optimised on most drivers.

quote:
Blocks of vertex array data may be stored in buffer objects with the
same format and layout options supported for client-side vertex
arrays. However, it is expected that GL implementations will (at
minimum) be optimized for data with all components represented as
floats, as well as for color data with components represented as
either floats or unsigned bytes.

_______________

Jester, studient programmer
The Jester Home in French

##### Share on other sites
jesterlecodeur is correct. Here''s a table from ATI''s OpenGL SDK (http://www.ati.com/developer/sdk/radeonsdk/Gl_sdk.zip):

Type			Native	Alignment	Components	RangeGLdouble		No			GLfloat			Yes	32-bit		1,2,3,4		+/- MAX_FLOATGLuint			No			GLint			No			GLushort		Yes	32-bit		2,4		[0,65536]GLshort			Yes	32-bit		2,4		[-32768,32767]GLushort (normalized)	Yes	32-bit		2,4		[0,1]GLshort (normalized)	Yes	32-bit		2,4		[-1,1]GLubyte			Yes	32-bit		4		[0,255]GLbyte			Yes	32-bit		4		[-128,127]GLubyte (normalized)	Yes	32-bit		4		[0,1]GLbyte (normalized	Yes	32-bit		4		[-1,1]

##### Share on other sites
Thanks for good suggestions... In particular the lack of int would explain a lot. I''ll try all of these and see how it turns out. I haven''t used VBO before so this is all new to me

##### Share on other sites
Ah, the smell of progress

tris   Mtris/s VAR / VBO1 / VBO220480         4.09 / 4.09 / 4.0981920         13.7 / 16.4 / 16.4327680        2.73 / 27.3 / 41.01310720       2.76 / 21.5 / 42.3

Replacing ints with floats alone increased the triangle rates dramatically (VBO1). Interleaving the vertex/normal/texcoord data had negligible effect (<1ms/frame). Adding a hardware buffer for indices caused another performance jump at the high end of poly counts (VBO2), although I may not end up using it if/when I implement some kind of a LOD scheme.

Also it turns out that I''m not out of VRAM after all.. I''m using ~21M for the vertex arrays at the highest detail level. Still, this means I''ll have to cut the detail if I ever want to display more than one planet.

In case you''re wondering what the thing looks like, here''s a picture.

##### Share on other sites
Well, I don''t think you''re going to have multiple planets close enough together for such detail to need to be shown simultaneously on all. . .

##### Share on other sites
so youre saying you dont have any slowdowns when using multiple vbs for position/normals etc.? hm, time to either get an ati or hope newer drivers work better, because the current setup is horribly chaotic *g*

##### Share on other sites
Yes, it''s interesting because what you said made a lot of sense. I''m keeping them all in a single array now anyway since it''s easier to manage (and other hardware might not be as forgiving).

I did find that the indices themselves want to be as sequential as possible rather than jumping around within the array(s). And I guess it''s easier for caching when subsequent triangles re-use vertices too. So ordering the triangles like it was a triangle strip seems to be the fastest to render.

1. 1
2. 2
JoeJ
20
3. 3
frob
16
4. 4
5. 5

• 10
• 10
• 11
• 13
• 9
• ### Forum Statistics

• Total Topics
632195
• Total Posts
3004717

×