• Advertisement
Sign in to follow this  

Vertex arrays slower than direct glVertex ?

This topic is 4741 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I've got an ATI 9800 Pro. I'm rendering about 40000 quads with no texture and just basic lighting. The frame rate is pretty slow if I render them using vertex arrays and glDrawElements(). If I use a for-loop and call glNormal and glVertex functions in it, it's oddly much faster. (If I use a vertex object it's much faster than either, as it should be.) Any ideas why? Are ATI's drivers just crappy or am I missing something?

Share this post


Link to post
Share on other sites
Advertisement
Vertex arrays should be faster if you are reusing any vertices. They're only transformed once, and require less memory (thus requiring less bus bandwidth).

Feel free to correct if I'm wrong on this one.

*EDIT: forgot the word "and."*

Share this post


Link to post
Share on other sites
Here's the whole thing:


#if 0
glVertexPointer(3,GL_FLOAT,sizeof(Vertex),vertex[0].v.ptr());
glNormalPointer(GL_FLOAT,sizeof(Vertex),vertex[0].n.ptr());
glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_NORMAL_ARRAY);
glDrawElements(GL_QUADS,count,GL_UNSIGNED_INT,index);
glDisableClientState(GL_VERTEX_ARRAY);
glDisableClientState(GL_NORMAL_ARRAY);
#else
glBegin(GL_QUADS);
for(int i=0;i<count;i++) {
glNormal3fv(vertex[index].n.ptr());
glVertex3fv(vertex[index].v.ptr());
};
glEnd();
#endif




I'm rendering 65025 quads so count=260100. That's quite a few calls to glVertex3f and glNormal3f, and it's still faster.
Also it's a regular grid mesh. So each vertex is shared by four neighbouring quads.

Share this post


Link to post
Share on other sites
that'll be your problem then, cards dont tend to react well to that many verts in one batch, really you want to keep your index count below around 65K (or an unsigned byte).
So the copying of the data, plus the massive index count is causing you to fall off the fast path when using the VAs.

Share this post


Link to post
Share on other sites
Ok.

I've replaced the call to glDrawElements with:

int step=3000;
for(int i=0;i<count;i+=step)
glDrawElements(GL_QUADS,min(count-i,step),GL_UNSIGNED_INT,index+i);

It's running much better.
But if i set the step size bigger, to say 4000 it's running pretty slow again.
I can always test to see whats the best value for my system, but how would I know what kind of values would run well on other people's hardware?
Seems kind of strange that the opengl drivers can split large amount to smaller ones on their own.

Share this post


Link to post
Share on other sites
Well, as people have all ready said, splitting up the data set would help significantly. Also, pass unsigned shorts to glDrawElements (it's preferred type) rather than integers.

Share this post


Link to post
Share on other sites
The whole thing is just a proto, so I'll be cutting down on the sizes later. I was just wondering why something so simple isn't handled better by the library. I mean if I want to render all of that stuff, then why can't I make the API calls to do it at once and have the library figure out how to pass it to the hardware efficiently.

python: Using shorts sounds like a good idea. I hadn't though about that. I'll remember to do that, when I'll have less than 64k vertices per batch. Should cut the copying overhead by half at no cost implementation- or otherwise.

Share this post


Link to post
Share on other sites
Quote:
Original post by FlowingOoze
python: Using shorts sounds like a good idea. I hadn't though about that. I'll remember to do that, when I'll have less than 64k vertices per batch. Should cut the copying overhead by half at no cost implementation- or otherwise.


If you have more than 64K vertices per batch, then you should split the batch up.

Share this post


Link to post
Share on other sites
u can query how many max are recommended

glGetIntegerv( GL_MAX_ELEMENTS_INDICES, &max_elements_indices );
glGetIntegerv( GL_MAX_ELEMENTS_VERTICES, &max_elements_vertices );

Share this post


Link to post
Share on other sites
GL_MAX_ELEMENTS_INDICES is 65536
GL_MAX_ELEMENTS_VERTICES is 2147483647

My code is now:

glVertexPointer(3,GL_FLOAT,sizeof(Vertex),vertex[0].v.ptr());
glNormalPointer(GL_FLOAT,sizeof(Vertex),vertex[0].n.ptr());
glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_NORMAL_ARRAY);
int step=3000;
for(int i=0;i<count;i+=step)
glDrawRangeElements(GL_QUADS,0,256*256,min(count-i,step),
GL_UNSIGNED_INT,index+i);
glDisableClientState(GL_VERTEX_ARRAY);
glDisableClientState(GL_NORMAL_ARRAY);




256*256=65536 which is well below MAX_ELEMENTS_VERTICES
The number of indices is at most 3000 which is well below MAX_ELEMENTS_INDICES.
It's running 51 FPS. Now if I change step=3000 to step=4000, ie. send 4000 indices at a time, it's running only 12 FPS, even though 4000 < MAX_ELEMENTS_INDICES.
Also it's running 23 FPS with the glVertex3f for loop. So no matter how you look at it, 12 FPS is not acceptable.

Share this post


Link to post
Share on other sites
hmmm intresting, i've got a test program which renders a 512*512 terrain (with color map, simple lighting and a gfx overlay), either via VBO and glDrawArrays() (so I'm not even sure the post T&L cache is being used) or via glVertex() commands and it comes out at 20fps constant @ fullscreen (1024*768*32) with v-sync apprently off. (and given the v-sync is 75hz i'm happy to belive its off) and i'm only using a 9800xt.

Althought, granted, I am drawing as triangles, not quads, maybe you want to try breaking the quads into tris instead? Quads might not be processed as fast via a VA...

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement