Sign in to follow this  
My_Mind_Is_Going

Optimize model rendering

Recommended Posts

I'm building a demo that is rendering a bunch of md2 models running around, and I'm looking for a way to speed up the rendering. Currently I'm interpolating between key frames every frame so I'm constantly recalculating vertices and normals and then just using immediate mode calls to draw. The two common methods for optimization are to use display lists and/or vertex arrays right? It doesn't seem like either of these is really applicable to my situation though because I would have to recompile the display list every frame as all the gl calls have changed and the same goes for using a vertex array which I would have to rebuild each frame. Actually using the arrays seemed like it might help but I've implemented it and it actually makes my framerate worse. Is there anything else I might try? I've heard people talk about VBOs, is that my only likely option? Thanks for the help as usual.

Share this post


Link to post
Share on other sites
Another pretty effective way is to have a few buffer slots (2 or 3) and assign these to a specific vertex attrib, then have the VS happily interpolate those vertex keyframes depending on a uniform weight.
This turns out a bit faster on a few systems I've tested but your mileage may vary according to your drivers and memory.

Share this post


Link to post
Share on other sites
i made a guantlet game last century, from memory it ran @ ~20fps with ~200 md2's onscreen this is with a nvidia tnt1, amd k6-300.
moral of the story, md2's contain very few verts ~500 max i think. with plain VAs u should get good framerates even with old hardware, ie look elsewhere for the bottleneck

Share this post


Link to post
Share on other sites
That's the thing, generating the vertex array each frame (I'm not reallocating the memory each frame don't worry) actually seems to be slower than just using immediate mode. I'm testing with about 100 models right now and the framerate is dipping down to around 15 fps. When I comment out the immediate mode calls, so it's still performing all the interpolation calculations the framerate jumps up to like 140-150. This is all on my laptop (1.5GHz Turion) which is pretty weak but not horribly outdated or anything, should be faster than a K6 I would think.

edit: Just checked, this model has 670 triangles so that's 2010 vertices plus the weapon model adds another 366.

[Edited by - My_Mind_Is_Going on August 2, 2007 9:02:16 AM]

Share this post


Link to post
Share on other sites
Quote:
Just checked, this model has 670 triangles so that's 2010 vertices

theres a problem right there
the number of triangles should be quite close to the number of verts ie with 20percent

ie a triangle does have 3 verts but the neighbouring triangles share 2 of these verts
ie 2 triangles
<|>
have 4 different verts not 6

Share this post


Link to post
Share on other sites
No offense to zeds but that calculation he came up with doesn't matter. Because either way, the GPU draws the same amount of triangles.

VBO's are in the GL 1.5 specification, you should have it even on a 5000 series nvidia card (no idea about ATI).

Seriously VBO's are the way to go. But first:

How many frames of animation do you have?

One frame of animation will cost you 2010*3(coordinates)*3(normals), which isn't that bad. If you have alot of textures/models/etc, then static frames are going to eat your GPU memory fast.

Another thing, are you interpolating for each model? If so , thats probably eating your CPU like mad.

Share this post


Link to post
Share on other sites
I eliminated all the unnecessary vertices but using a vertex array is still slower than immediate mode which seems pretty stange.

I have to calculate all the interpolated quantities with either method so the difference between the two is that in one I have ~2000 glVertex calls and in the other I write ~350 vertices to a vertex array and then send an index array of ~2000 values, the latter being the (slightly) slower of the two. Does this sound wrong it anyone else?

Edit: I thought it might be the CPU but just commenting out the glVertex,glNormal, and glTexCoord calls in the inner loop causes the frame rate to jump by like 80-90 fps. I am interpolating for each model but it seems like I have to since I don't don't have any way to know which models are in which animation state (especially since they can have variable interpolation rates), they're all pretty much doing their own thing.

As for VBOs being part of the GL 1.5 specification, I've never used extensions before for anything but don't I have to use them to access any features beyond GL 1.1 or something? I'm really not clear on how the whole windows drivers vs. vendor drivers business works.

Share this post


Link to post
Share on other sites
Quote:
Original post by dpadam450
VBO's are in the GL 1.5 specification, you should have it even on a 5000 series nvidia card (no idea about ATI).
GF4Ti supported it, as well as GF4MX and very probably all NV1x (but I'm not sure here). I have seen some R200/8500 sporting them but I'm not sure how much is widespread (I've seen some really outdated drivers still around for those cards).
So, it's even better than this.

Not using VBO is simply pointless.

Share this post


Link to post
Share on other sites
Quote:
No offense to zeds but that calculation he came up with doesn't matter. Because either way, the GPU draws the same amount of triangles.
none
taken, but the most important thing is the num of vertices transformed not the num of tris drawn
ala in my example
<|>
the card is smart enuf not to recalculate the same vertice position that it had just calculated again, thus sharing verts is often a big win performance wise.

transformingwise -> drawing 2000 tris with 2000 verts is always slower (by a large amount) than 2000 tris with 500 verts

to My_Mind_Is_Going ive found if the framerate is ~60 or greater lerping doesnt really matter to much (which is a pity)
with your VA stuff cooment out the glDrawElements(..) line if the framerate doesnt markedly improve then its your cpu work thats slowing it down, if so post that code, someone could help

Share this post


Link to post
Share on other sites
With glDrawElements commented out the frame rate is approximately as it was with the immediate mode calls commented out. So it's not the CPU then? It's not that I'm surprised that my laptop's integrated graphics are weak it's just that there should be a difference between these two rendering methods right?

Share this post


Link to post
Share on other sites


for ( int i = 0 ; i < ModelData->numVertices ; i++ )
{
x1 = vList[i].point[0] ;
y1 = vList[i].point[1] ;
z1 = vList[i].point[2] ;

x2 = nextVList[i].point[0] ;
y2 = nextVList[i].point[1] ;
z2 = nextVList[i].point[2] ;

vertexArray[i].point[0] = x1 + (x2-x1)*interpol ;
vertexArray[i].point[1] = y1 + (y2-y1)*interpol ;
vertexArray[i].point[2] = z1 + (z2-z1)*interpol ;

x1 = normals[ nList[i] ].point[0] ;
y1 = normals[ nList[i] ].point[1] ;
z1 = normals[ nList[i] ].point[2] ;

x2 = normals[ nextNList[i] ].point[0] ;
y2 = normals[ nextNList[i] ].point[1] ;
z2 = normals[ nextNList[i] ].point[2] ;

normalArray[i].point[0] = x1 + (x2-x1)*interpol ;
normalArray[i].point[1] = y1 + (y2-y1)*interpol ;
normalArray[i].point[2] = z1 + (z2-z1)*interpol ;

Normalize(normalArray[i].point) ;
}

glVertexPointer(3,GL_FLOAT,0,vertexArray) ;
glNormalPointer(GL_FLOAT,0,normalArray) ;
glTexCoordPointer(2,GL_FLOAT,0,ModelData->textureCoords) ;

glDrawElements(GL_TRIANGLES,ModelData->numTriangles*3,
GL_UNSIGNED_INT,vertexIndices) ;




Share this post


Link to post
Share on other sites
Heres the easiest thing to do. search for gDebugger, I think its on Nvidia devloper website if you cant find it. You can use it for a trial period and it will show you how the cpu/gpu are doing and how many GL calls you have per frame.

Share this post


Link to post
Share on other sites
zedz, yea after i posted i realized the cpu still interpolates extra verts. I just deal with artists and they are like "This model is only 300 verts", yea in memory but it still draws like 1000 triangles (bit exaggerated but could happen).

Share this post


Link to post
Share on other sites
My_Mind_Is_Going thats not ideal
eg
x1 = vList[i].point[0];
copys the float over, is it necessary?

(ild ignore the normals for now, also normalize aint cheap stick it in a vertexshader if possible or use the gl normalize normal function)

Share this post


Link to post
Share on other sites
Quote:
Original post by zedz
My_Mind_Is_Going thats not ideal
eg
x1 = vList[i].point[0];
copys the float over, is it necessary?

(ild ignore the normals for now, also normalize aint cheap stick it in a vertexshader if possible or use the gl normalize normal function)


Well I can eliminate the intermediate x1 and x2 stuff it was just to make the code look cleaner. But this and the normalization issue are only going to free up the CPU correct? The OpenGL calls seem to be the cause of the slowdown, because I can make those same CPU optimizations to my immediate mode routine as well.

Share this post


Link to post
Share on other sites
Okay, here's a new problem. I can't get my texture coordinates to work correctly with the VA approach. The problem is (or seems to be) that there isn't a one to one correspondence between vertices and texture coordinates. The model has 358 vertices and 469 texture coordinates (s,t pairs). Each triangle data structure has an index to the vertices and texture coordinates for that triangle and these indices are used to get the actual vertex/normal/texture coordinates out of the actual lists of vertex/normal/texture coord data. I'm assuming that the difference between the number of text coords and vertices is due to different triangles using the same vertices but using them with different texture coordinates. This means that I can't just feed glTexCoordPointer a list of all the texture coordinates because when glDrawElements goes to draw element n it's going to grab vertex n and normal n from the vertex and normal arrays but it's going to also grab texture coordinate pair n which is probably the correct pair to use for that vertex especially since it could be any of a number of different triangles that use that vertex. I'm not sure what to do about this, but given the lack of performance boost from using VAs in the first place I'm inclined to just go back to using immediate mode for now and check into VBOs later.



[Edited by - My_Mind_Is_Going on August 4, 2007 3:36:05 PM]

Share this post


Link to post
Share on other sites
VBOs are not gonna improve the situation if VAs dont

wrt 358 vertices and 469 texture coordinates
u will need to duplicate some verts
thus u have 469 verts + 469 tc's

was it even working visually correctly with VAs?

Share this post


Link to post
Share on other sites
Well no the texture coordinates were never working correctly. I'm having a difficult time figuring out how to select the vertices needed for each frame based on the texture coords. I'm going to have to rework most of the load for loading the data as well as rendering. *sigh*

Edit: Okay I rewrote some of the loading code and now it's doing what I want except it's still very slow. What's going on here, it doesn't seem to matter how many vertices I pass in, or gl calls I make, my video card just can't handle it?

Anyway, this has been an educational experience none the less, thanks to everyone who contributed.

[Edited by - My_Mind_Is_Going on August 4, 2007 7:20:26 PM]

Share this post


Link to post
Share on other sites
Ok heres what you should do since your still clueless as to why. Try:

Use Fraps for framerate (or whatever, just dont use your own code for it).

Draw nothing on the screen and see what your framerate is.

Draw a 100 vert polygon. See how much it drops.

Draw multiple of those objects using immediate or VA. See how fast the framerate drops by adding those.

Do this without any animation.

Share this post


Link to post
Share on other sites
mate u do realise my not rendering the same thing with VAs as u are with immediate mode makes benchmarking between them invalid
u have to get your VAs working correctly first (im surprised the thing didnt crash)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this