Sign in to follow this  
The Rug

Vertex array slowness

Recommended Posts

I've been trying to do some simple tests to see the benefits of vertex arrays, and get a feel for them. Unfortunately, I seem to be getting slower performance using them than in immediate mode. I'm using a Radeon 9700 with the latest drivers. I'm pretty sure I must have missed something or be under the wrong impression about vertex arrays, but it doesn't hurt to ask, so... I have an array of verteces called heightValues, which, in my basic test, has a length of 750,000 (250,000 verteces each with x, y and z coords). I also have an array of indicies to be used as a triangle strip, which is 500,000 elements long. This is the code I use for the vertex array test:
    // width and height are each 500

    glEnableClientState(GL_VERTEX_ARRAY);
    glVertexPointer(3, GL_FLOAT, 0, heightValues);
    glDrawElements(GL_TRIANGLE_STRIP, height * width * 2, GL_UNSIGNED_INT, indicies);
    glDisableClientState(GL_VERTEX_ARRAY);
And this is what I use for the immediate mode test:
    glBegin(GL_TRIANGLE_STRIP);
    for(int count=0; count<height * width * 2; ++count)
        glVertex3f(heightValues[ indicies[count]*3 ], heightValues[ indicies[count]*3 +1 ], heightValues[ indicies[count]*3 +2 ]);
    glEnd();
Aside from that, there isn't much going on. My rendering loop is pretty sparse, with just a couple of matrix transformations to position the output. The performance is about 36fps using vertex arrays, and 55fps using immediate mode. The issue is this: I wouldn't imagine that the vertex arrays could produce slower results when using large numbers of verteces, when both pieces of code give the same results. Anyone who can settle this would be greatly appreciated.

Share this post


Link to post
Share on other sites
I would need to see more code. Like how are you setting up the arrays. And not sure but I think there is some kind of optimal indice count and vertex count... Someone else will have to comment on that.

Share this post


Link to post
Share on other sites
Thanks for your reply... This is where I initialise the arrays:

heightValues = new float[width * height * 3];

for(int count=0; count<width*height*3; count+=3) {
heightValues[count] = (count/3)%width;
heightValues[count+2] = (count/3 + 3)/width;
heightValues[count+1] = sin(heightValues[count]) + cos(heightValues[count + 2]);
}

indicies = new GLuint[width * height * 2];

int c = 0;
for(int count=0; count<width*height-width; ++count) {
indicies[c] = count;
indicies[c+1] = count + width;
c+=2;
}

Its pretty poor, but it works.

I realise this isn't a very scientific test, but even so I would have thought vertex arrays would at least equal immediate mode performance... that's what's really bugging me.

Share this post


Link to post
Share on other sites
Radeon 9700

I really must be missing something here. I just tried setting width and height to 100 and immediate mode got ten times the speed of the vertex arrays code. So there must be an optimal number of vertices/indicies as MARS_999 said... (or a huge flaw in my code, but I've posted all their really is that is relevant)

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
try reducing the triangles drawn to check the theory.... when your application draws the vertices, you do all right. there's not much to do wrong here...

Share this post


Link to post
Share on other sites
I am not too sure what is going on. My only real guess would be that rendering 250k vertices at once with gl*Pointer is reaching some sort of diminishing returns (as someone said previously).

There are some stats from Game Programming Gems I where they ran tests on various vertex submission techniques. For 300 vertices, immediate mode required ~163,154 CPU cycles, while gl*Pointer required ~51,212 CPU cycles. (Interleaved took ~72,821 cycles). Clearly immediate mode is the worst possible choice ;P But this is for 300 vertices, and you are using a lot more than 300.

I would try breaking your vertex array down into smaller parts and see what happens.

Share this post


Link to post
Share on other sites
Are you checking for gl errors? If not I suggest you do at the end of your rendering code, and if an error is being reported add checks else where to see what exactly is causing it.

Share this post


Link to post
Share on other sites
The overhead of that many triangles at once could be the problem.
This pdf, while being aimed at D3D, does give a nice graph which shows performance as you increase batch sizes on a few cards on page 20 and even thought the FX card goes off the top of the graph I dare say it even would flatten off before it hit the number of tris you gave it.

Share this post


Link to post
Share on other sites
Thanks everyone, I think I'm closing in on the problem. Using a 30x30 grid gets roughly equal performance, so I think it sjust a matter of tweaking it from here. When that PDF finishes downloading (damn crappy internet connection *shakes fist*) I'll probably have more of a clue, thanks for that phantom [smile]

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this