Archived

This topic is now archived and is closed to further replies.

Cirian

Undoubtedly shonky.

Recommended Posts

I''m currently about to optimise my engine, and am hoping to get some suggestions from the all powerful members of this forum The fact of the matter is: my engine code produces some terrible framerates. I''m at a loss to know exactly where to start optimising it -- and the entire engine architecture is to blame in some respects. Out benchmarking program generates 500 static objects, and one particle emitter of 500 objects. The all share the same texture. This, it should be noted, is a 2D engine. All the objects are quads. The framerate is calculated right at the end of the program, and although marginally lower than the actual fps, it is pretty accurate (successful calls to render loop / engine uptime). Framerate with all bells and whistles on: 19 fps Framerate with no alpha blending: 22 fps Framerate rendering with glPolygonMode(...,GL_POINT): 25 fps Framerate with no drawing code (everything other than glCallLists/glBegin - End): 85 fps Framerate with nothing on the screen: 300+ fps I have forayed into the world of vertex arrays (for the particle system), but the framerate when rendering as a vertex array is 3-4 fps less. I have disabled vsync, and although my 3D card (a TNT2 Ultra) is a bit on ye olde side, I do get about 120 fps in Quake III arena. As I said before, the engine architecture is, in many ways, to blame; static objects each have their own display list, and between 4 and 8 operations are carried out on each of them, per frame. The particle array is drawn in immediate mode, simply because it gave a silly performance boost over vertex arrays. Objects are held in std::list containers, and each object executes its own Draw() function. A partcle array is a single object with multiple polygons. The app is not running in software mode, and I have the latest drivers. I know that something very screwy is going on, and I am going to do a complete overhaul of the rendering subsystem in the next few days. I would request any ideas for improving performance, although it is undoubtedly hard for you to contribute without any source code, please do try... Also, I will answer any questions about the rendering loop that you want, and will post code snippets if asked (I''m not posting all ~5000 lines in one go, sorry =) --Cirian

Share this post


Link to post
Share on other sites
I just noticed that I neglected to include any sizes/resolutions... the test is windowed, at 640*480*32. Each static object is 32*32 pixels, and each PArray polygon is between ~2*2 pixels to ~128*128 pixels.

The all use a 32*32 grayscale texture.

Share this post


Link to post
Share on other sites
i don''t think 32x32 textures are properly aligned, i remember the recomendations for optimal speed/preformance are textures of 64x64, 128x128 and 256x256

Share this post


Link to post
Share on other sites
I tested it with a 64*64 texture, the fps dropped to 11 (!)

Edit: larger power-of-two values proceed to lower the framerate further....

Edit 2: I forgot to mention, glBindTextures is only being called once -- the engine works out that it was bound last, and never binds it again.

[edited by - Cirian on June 8, 2002 10:05:19 AM]

Share this post


Link to post
Share on other sites
Wait a second: you are dropping from 19fps to 11, if all you do is replace the 32² by a 64² texture ?!

You definitely have a real problem somewhere. Now, while your engine structure itself seems to be inefficient (too many display lists, function calls, etc), it will never create such a heavy performance breakdown. Are you sure that you aren''t uploading your texture every frame ? But even if you did, it wouldn''t drop that much, a 64² intensity texture is nothing.

Sorry to ask that (since you already mentioned it) but: are you sure that you are not in software mode ? Your whole problem description literally screams software mode. Do a quick check: are your textures filtered or not ?

/ Yann

Share this post


Link to post
Share on other sites
quote:

To check if you''re in software mode: use glGetString(GL_VENDOR). If the vendor is your graphics card maker then you''re in hardware. If it''s MS then you''re in software.


Doesn''t work, if he''s using a specific feature that is unsupported by his HW, and will cause a SW fallback. The texture filtering check is usually a good way to determine this (but not always either, depends on the specific fallback).

Share this post


Link to post
Share on other sites
Sorry about the glacial response time (Windows managed to orphan the system32 directory (!!!); thank God for CHKDSK and NTFS) -- glGetString produces "NVIDIA Corporation", the texture loading routine is being called 501 times (500 of them will return with a pointer to the managed container, so the texture is only being loaded once).

The engine loads min and mag textures with GL_LINEAR, and it is certainly filtering them succesfully (magnified textures are nice and antialiased). One thing to note though, switching to GL_NEAREST lowered the fps by 2.

I have some more stats too (taken from an email my partner sent me -- he has a GeForce 2):
Test 1:
800 object particle array (size 16x16, scaled to 20x20)
500 static objects (size 64x64, no scale)
1 static object (size 64x64, scale to 90x134)
fps: 22 (at fullscreen: 22)

Test 2:
800 object particle array (size 16x16, scaled to 20x20)
500 static objects (size 64x64, scale to 16x16)
1 static object (size 64x64, scale to 90x134)
fps: 38 (at fullscreen: 37)

Test 3:
800 object particle array (size 16x16, scaled to 20x20)
1 static object (size 90x134, scale to 640x480)
fps: 40 (at fullscreen: 42)

Test 4:
800 object particle array (size 16x16, scaled to 20x20)
1000 static objects (size 64x64, scale to 16x16)
fps: 36 (at fullscreen: 37)

-----------

Depressing....

[edited by - Cirian on June 8, 2002 10:06:53 PM]

Share this post


Link to post
Share on other sites
There is somehing very wrong with your code. OK, so you seem to be in hardware mode. There is a huge CPU hog hidden somewhere in your rendering code.

Just to be sure: what screenmode do you use ? If it is 16bit, make sure that you do not request a stencil buffer ! All nVidia cards below GF3 will fall back to software emulation, if you do that (even if you don''t use stenciling).

Let''s see:

Test 1:
800 object particle array (size 16x16, scaled to 20x20)
500 static objects (size 64x64, no scale)
1 static object (size 64x64, scale to 90x134)
fps: 22 (at fullscreen: 22)

Test 2:
800 object particle array (size 16x16, scaled to 20x20)
500 static objects (size 64x64, scale to 16x16)
1 static object (size 64x64, scale to 90x134)
fps: 38 (at fullscreen: 37)

OK, so if I got that right, then the only difference between those 2 tests is that ''scale'' I highlighted. What do you mean by scale ? What operations do you perform ? If the sizes simply denote the pixel sizes of the rectangles you draw on the screen (I will assume that for now), then we have (for test1):

Fillrate: 800*20*20 + 500*64*64 + 90*134 = 2.4 Mtexel/frame
Tricount: 800*2 + 500*2 + 2 = 2602 tris/frame

This scene should render at least at 200 fps on a GF2.

IMO, your best bet is to use a profiler to track down where all the CPU cycles disappear.

/ Yann

Share this post


Link to post
Share on other sites
Scaling is the quad dimension stuff, as you correctly assumed... I'm just about to do some more profiling, I'll be back with the results.

The colour depth is 32 bit, like the desktop. I'm not requesting a stencil buffer, and I'm using a 16 bit z-buffer -- I set this to 0 bits and got an extra 2-3 fps.

Edit: Reinstalling VC++ (sigh), and my spelling leaves something to be desired....

[edited by - Cirian on June 8, 2002 11:21:56 PM]

Share this post


Link to post
Share on other sites
Interesting... the profile results are below.

    
14555.972 61.1 21180.102 89.0 548 MASICiX::Render(void) (mcx2.obj)
3442.726 14.5 3442.726 14.5 273500 MASIObj::Draw(void) (masiobj.obj)
1764.759 7.4 1764.759 7.4 547 PArray::Draw(void) (particles.obj)
645.450 2.7 4088.176 17.2 547 MASICiX::DrawMAS(void) (mcx2.obj)
4.214 0.0 1768.973 7.4 547 MASICiX::DrawPAR(void) (mcx2.obj)


The Render() function took up 15 of 23 seconds running time -- yet the child functions (DrawMAS() and DrawPAR()) take up a third of that.

Render() simply calls glClear, the two drawing functions, glFlush and swapbuffers. Any ideas?

[edited by - Cirian on June 8, 2002 12:58:56 AM]

[edited by - Cirian on June 9, 2002 1:12:17 AM]

Share this post


Link to post
Share on other sites
try using the compilers optimize thingo (MSVC++ Pro and above)
oh yeah, are you using a high performace rendering loop?

heres an example:

    
MSG msg;
while (rendering) {
//Process all pending messages

while (PeekMessage(&msg,NULL,0,0,PM_NOREMOVE)==TRUE)
{
if (GetMessage(&msg, NULL, 0, 0) )
{
TranslateMessage(&msg);
DispatchMessage(&msg);
} else {
return TRUE;
}
}
drawScene();
}


when i figured this out, i got 85fps extra, hehe, /me st00pid
really hope this helps

[edited by - silvermace on June 9, 2002 1:56:09 AM]

Share this post


Link to post
Share on other sites
quote:
Original post by Yann L
Wait a second: you are dropping from 19fps to 11, if all you do is replace the 32² by a 64² texture ?!



Remember the image is being drawn 500 times, if that makes any difference.

Share this post


Link to post
Share on other sites
quote:

Remember the image is being drawn 500 times, if that makes any difference.


No If you use a higher texture resolution, then it will get slower (due to cache misses), but a 64² texture is virtually nothing for a 3D card. I would expect a larger performance drop when going from 32² to 2048², but not in his case. It will of course affect fillrate, but since his app is far from being fillrate limited, the problem is somewhere else.

Share this post


Link to post
Share on other sites
Umm.. here is some alarming evidence that would _suggest_ the app is fillrate limited:

500 objects (quads) are being drawn. Four tests are shown here, each with exactly the same number of objects, calculations, etc. The only difference is tests 2 and 3 have a call to glScale(4.0f); and tests 3 and 4 use a texture with transparency. Frame rate is measured here in Hz.

1.
tex => flame.bmp (square image, no transparency)
size => 16x16
Hz: 134

2.
change size => 64x64
Hz: 34

3.
change tex => lumiere2.png (square image, with transparency)
change size => 16x16
Hz: 127

4.
change size => 64 64
Hz: 30


Is this much of a jump _really_ to be expected?

Share this post


Link to post
Share on other sites
We also completely converted the Particle Arrays to use glInterleavedArrays / glDrawElements. Our framerate is identical to immediate mode. To see if there was a bottleneck within the particle update calculations, I disabled updates after 200 frames. The framerate changed marginally, getting us an extra two fps when no operations are performed other than the glDrawElements code.

[edited by - Cirian on June 18, 2002 3:58:29 AM]

Share this post


Link to post
Share on other sites