• ### Announcements

#### Archived

This topic is now archived and is closed to further replies.

# Killer framerates.

## 30 posts in this topic

Ok. Im displaying 2048 tris, colored, and my framerates are awful. Im getting 17 fps with colors, and like 26 fps without color. Im not doing anything fancy, just glVertex()ing in a for loop. Basically, Im running in windowed mode using GLUT, calling Main() each frame, which calls Render(), which calls map.Draw() map.draw() is this. //MAPSZ = 32 glBegin(GL_TRIANGLES); // int i = 0, j = 0; for(int i = 0; i < MAPSZ - 1; i++) { for(int j = 0; j < MAPSZ - 1; j++) { //Tri 1. glColor3b(color[j].r, color[i][j].g, color[i][j].b); glVertex3f(i, j, zdata[i][j]); glColor3b(color[i+1][j].r, color[i+1][j].g, color[i+1][j].b); glVertex3f(i + 1, j, zdata[i+1][j]); glColor3b(color[i][j+1].r, color[i][j+1].g, color[i][j+1].b); glVertex3f(i, j + 1, zdata[i][j+1]); //Tri 2. glColor3b(color[i+1][j].r, color[i+1][j].g, color[i+1][j].b); glVertex3f(i + 1, j, zdata[i+1][j]); glColor3b(color[i+1][j+1].r, color[i+1][j+1].g, color[i+1][j+1].b); glVertex3f(i + 1, j + 1, zdata[i+1][j+1]); glColor3b(color[i][j+1].r, color[i][j+1].g, color[i][j+1].b); glVertex3f(i, j+1, zdata[i][j+1]); } } glEnd(); Hw come my framerate is so abysmal ? I came, I saw, I got programmers block. ~V''''lion
0

##### Share on other sites
You're doing a lot of array referencing there. It'd take a while for me to tune an explanation to your code, so I'll just give a general explanation...

Things like this:

for(int a = 0; a < 32; a++) ArrayA[ a ] = ArrayB[ a ];

Are faster like this:

int* ArrayAptr = &ArrayA[0];
int* ArrayBptr = &ArrayB[0];
for(int a = 0; a < 32; a++) *ArrayAptr++ = *ArrayBptr++;

I don't know how much of an improvement you'd get, but it should be noticable.

Edited by - Beer Hunter on October 4, 2001 7:42:13 PM
0

##### Share on other sites
A very good programmer said to me that the "C-Pascal" version is better for modern compilers. I think he had looked at the assembler output.
If you want to produce fast loops for really old compilers should you try to get rid of the index variable:

for(ArrayAPtr=&ArrayA[0];ArrayAPtr<&ArrayA[32];ArrayAPtr++=ArrayBPtr++);

Could probably be done even faster. Stuff like this was important a long time ago. The best way to copy arrays is to used the library functions but the above was just to show something.

Vlion, the best way to improve your code is to skip immediate mode and use vertex arrays instead.
0

##### Share on other sites
the problem is that your calling for every triangle several glfunctions.. just put all data in one or two arrays, use glvertexarray and glcolorarray and draw all at once.
0

##### Share on other sites
I think what is really important is what both of you take for granted: don't use 2d-arrays!

Say you declared zdata like this:

GLfloat zdata[32][32];

With a common, nonoptimizing compiler, a reference like
zdata[ a ][ b+1 ]=1
would compile to something like

int index=a*32+b+1;
zdata[ index ]=1

So you see, for each reference you get a multiplication, wich is slow. You have 24*32*32= 24576 multiplications here just because of the 2d-array... That shouldn't be enough to lower your framerate that much though... But you could fix it by using 1d-arrays, like
Gfloat zdata[32*32];

and then do the 2d-indexing manually:

int index=0;
for(int i = 0; i < MAPSZ - 1; i++){
for(int j = 0; j < MAPSZ - 1; j++){
//Tri 1.
glColor3b(color[index].r, color[index].g, color[index].b);
glVertex3f(i, j, zdata[index]);

glColor3b(color[index+32].r, color[index+32].g, color[index+32].b);
glVertex3f(i + 1, j, zdata[index+32]);

glColor3b(color[index+1].r, color[index+1].g, color[index+1].b);
glVertex3f(i, j + 1, zdata[index+1]);

//Tri 2. = thesame
index++
}
}

Should work. Can't promise you 200+ framerate though, wich I got with your original code...

Edited by - joke_dst on October 5, 2001 6:51:28 AM
0

##### Share on other sites
are you using a Voodoo/Voodoo2
if so it could be that those cards DON''T support OpenGL
(by default) (Win98 will then use a Software OpenGL driver)

so you will have to load 3dfxvgl.dll or something like that
(that driver should be included in GLsetup)

you could also copy the dll to the same dir as the executable and rename it to opengl32.dll (it will only run on 3dfx cards though)

A new world, dark without the false light ....
0

##### Share on other sites
voodoo 3
and it definatly supports opengl.
Before my latest driver update i could tell the acceleration when i switched to certain modes.
thanks so far.
ill see what i can do with all that you have given me.

I came, I saw, I got programmers block.
~V''''lion
0

##### Share on other sites
Stop the crap about your tweak guys. It is NOT the bottleneck in this case. The problem is like rapso said. Too many gl... calls. Vertex array will give you much better speed.
0

##### Share on other sites
Vlion.

Forget about these people telling you that your bottleneck is array referencing, they don''t know what they''re talking about. It could make a difference with a million loop, not with 2000 tris. Your application is not CPU-limited.

Forget about these people saying that your bottleneck is the number of gl calls per triangle. Again it looks like they have no idea about graphics performance. And again, if you display thousands and thousands of triangles, it could make a difference, but that''s not your case.

The fact that your application is a lot faster without colors is an hint to a fillrate problem; that is, you are drawing too many pixels. If you change the size of your window, you should see a big speed difference.

My suggestion is to check that you are really using hardware acceleration. Try to use glString to know more about your driver; if you see the microsoft generic implementation, you are running in software, which could explain all the problems you see.

I''m also not sure that Vaudoo3s accelerate rendering in windowed mode. Try in fullscreen to see if it makes a difference.

Y.
0

##### Share on other sites
Hmm...
The last GLUT release fixed the fullscreen prob Id been having. but no dice.
I ran a quick routine to write FPS to a file while i was in fullscnreen mode, and it only seems to increase speed by 1 FPS or so.

Changing window size does not do anyhing.
If anything, it decreases the FPS.

Ive implemented a texture system, and its dropped FPS 5 frames or so. Its slightly erratic though. :-D

Anyway, I can play games like unreal tournement, homeworld, X-Wing vs TIE fighter, blah blah blah.
So I know my cards capable.

I am going to try to check my Implementation.
I mihgt be running in software mode. I dont think so but...
~V''lion

I came, I saw, I got programmers block.
~V''''lion
0

##### Share on other sites
Heres the glGetString() results
It interesting...I didnt know i had 2 multi-texture extensions.
3Dfx Interactive Inc.
3Dfx/Voodoo3 (tm)/2 TMUs/16 MB SDRAM/ICD (Nov 2 2000)
1.1.0
GL_ARB_multitexture
GL_EXT_abgr
GL_EXT_bgra
GL_EXT_clip_volume_hint
GL_EXT_compiled_vertex_array
GL_EXT_packed_pixels
GL_EXT_point_parameters
GL_EXT_stencil_wrap
GL_EXT_vertex_array
GL_SGIS_texture_edge_clamp
GL_EXT_paletted_texture
GL_EXT_shared_texture_palette
GL_SGIS_multitexture
WGL_EXT_extensions_string
WGL_3DFX_gamma_control
WGL_EXT_swap_control
WGL_ARB_extensions_string

I came, I saw, I got programmers block.
~V''''lion
0

##### Share on other sites
I think it can be quite normal to have that low of a frame rate if your drawing 2000 triangles on a Voodoo 3... but I guess the drivers are fucked up... or your pixel descriptor has some problems... ;o) (if the pixel descriptor is not available it'll use a Software OGL implementation on some systems.. maybe you're trying to get 32 bit color depth? that would be and obvious reason for the slowdown on the Voodoo 3...

cya,
Phil

ah yes, i forgot, alsow try to use glVertex3fv() instead of glVertex3f()... you pass a pointer instead of the single float values, might be a speedup too :o))

Visit Rarebyte!
and no!, there are NO kangaroos in Austria (I got this questions a few times over in the states

Edited by - phueppl1 on October 9, 2001 7:55:11 AM
0

##### Share on other sites
The Voodoo 3 only uses hardware acceleration in hardware mode, and even then you need to be at an 800x600 resolution. Are you using those settings?
0

##### Share on other sites
quote:
Original post by Anonymous Poster
The Voodoo 3 only uses hardware acceleration in hardware mode, and even then you need to be at an 800x600 resolution. Are you using those settings?

I meant to say only uses hardware acceleration in fullscreen mode. Duh.
0

##### Share on other sites
Nope. I had a file store framerates and it does not show appreicable difference. Hmmm.
Could GLUT be the problem ?
~V''lion

I came, I saw, I got programmers block.
~V''''lion
0

##### Share on other sites
  #define MAPSZ 32struct COLOR{ GLubyte R, G, B;};struct ROW{ COLOR vertColor[MAPSZ]; float vertZ[MAPSZ];};struct MAP{ ROW mapRows[MAPSZ];};MAP MyMap;void Render (){ int i, j; float i2, j2; ROW *Row1; ROW *Row2; glBegin (GL_TRIANGLES); for (i = 0; i < MAPSZ - 1; i++) { for (j = 0; j < MAPSZ - 1; j++) { i2 = (float) i; j2 = (float) j; Row1 = &MyMap.mapRows[i]; Row2 = &MyMap.mapRows[i + 1]; //Tri 1. glColor3ubf (&Row1->vertColor[j].R); glVertex3f (i2, j2, Row1->vertZ[j]); glColor3ubf (&Row2->vertColor[j].R); glVertex3f (i2 + 1, j2, Row2->vertZ[j]); glColor3ubf (&Row1->vertColor[j + 1].R); glVertex3f (i2, j2 + 1, Row1->vertZ[j + 1]); //Tri 2. glColor3ubf (&Row2->vertColor[j].R); glVertex3f (i2 + 1, j2, Row2->vertZ[j]); glColor3ubf (Row2->vertColor[j + 1].R); glVertex3f (i2 + 1, j2 + 1, Row2->vertZ[j + 1]); glColor3ubf (Row1->vertColor[j + 1].R); glVertex3f (i2, j2 + 1, Row1->vertZ[j + 1]); } } glEnd ();}

slightly more optimised. oh, int2float 32*32 times a frame is another way to kill the frame rate, so i added i2 and j2 (floats) which will be passed to OGL. cuts down the number of casts.

MENTAL

Edited by - MENTAL on October 13, 2001 2:55:30 PM
0

##### Share on other sites
Hi,

I think those ppl who mentioned that the gl calls is the bottleneck are right. However, such loops could also arise in different cases and should be avoided. The Data, also, doesn''t need to be recalculated dynamically, such things should be done one time only. (Then, you can easily use [eventually compiled] vertex arrays). To the non-bottlenecks that lower performance even more:

First, loops in loops aren''t a very good thing, because the outer loops'' index has to stay in a register all the time, or is permanently being reloaded and trashed, that costs time: Avoid loops in loops, if possible.

Next, those array-indexing problems are compiler-dependent. MSVC++ optimizes repeated array-index accesses. Also, there is basically no difference between incrementing an array and an index variable, the problem is that often the compiler-generated code stores the index AND the array position. If you pass an array-element to a function, the compiler makes a pointer out of it (not always, thats a little complicated)

Another small thing is that
  for(; ; i++)

is slower than
  for(; ; ++i)

In this case, the compiler has to store the old value, because you told it to post-increment the variable, which is not neccessary. Pre-increment is a bit faster, which will be noticeable in inner loops.
0

##### Share on other sites
wtf?!?!?!

++i is not faster than i++. ++i is the C-style version and i++ is C++ version. every compiler i know of uses exactly the same line of asm (inc i) for that operation.

MENTAL
0

##### Share on other sites
quote:
Original post by MENTAL
++i is the C-style version and i++ is C++ version.

That''s not true. "++i" is the preincrement, and "i++" is the post increment. One increments the value before it is evaluated (preincrement) and the other increments the value after it is evaluated (postincrement). They''re both in C and C++.

[Resist Windows XP''s Invasive Production Activation Technology!]
0

##### Share on other sites
Actually, ++i is faster than i++ because the compiler doesn''t have to return the unincremented value. For integer types it probably won''t make a difference, but with more complex types it becomes noticeable. Take a look at iterators in STL:

typedef std::list LLIST;
LLIST somelist;
LLIST::iterator itor;

//This is faster
for (itor = somelist.begin (); itor != somelist.end (); ++itor)

//This is slowerfor (itor = somelist.begin (); itor != somelist.end (); itor++)
For further reference, I suggest Question #5 at:
0

##### Share on other sites
heheh.
Heres VC++6 asm for i++ and ++j;
---
235: int i = 0, j = 0;
00401988 C7 45 FC 00 00 00 00 mov dword ptr [ebp-4],0
0040198F C7 45 F8 00 00 00 00 mov dword ptr [ebp-8],0
236: i++;
00401996 8B 45 FC mov eax,dword ptr [ebp-4]
00401999 83 C0 01 add eax,1
0040199C 89 45 FC mov dword ptr [ebp-4],eax
237: ++j;
0040199F 8B 4D F8 mov ecx,dword ptr [ebp-8]
004019A2 83 C1 01 add ecx,1
004019A5 89 4D F8 mov dword ptr [ebp-8],ecx
---
I see no difference.

Mentor- I dropped your code into my program, and it worksok.
It holds to a constant framerate- about 22-23 fps.
Not the greatest, but alot better than before.
Next I need to integrate it/work on it some.
Then Ill try for vertex arrays and whatever else I can squeeze out of OGL.

The goal is to display a 1056x1056 map at a reasonable framerate, along with some units.
I need to work out some stuff before I display stuff that big. :D

~V''lion

I came, I saw, I got programmers block.
~V''''lion
0

##### Share on other sites
like i said, ++i and i++ do the same thing. in ye olde dayse when C was new and compilers were crap, ++i might have been faster, but to be frank, i think people caught onto the fact that ++i and i++ do the same thing, so they gave them the same line of asm code, as Vilon showed.

; 6    : 	i++;	mov	eax, DWORD PTR _i$[ebp] add eax, 1 mov DWORD PTR _i$[ebp], eax; 7    : 	++j;	mov	ecx, DWORD PTR _j$[ebp] add ecx, 1 mov DWORD PTR _j$[ebp], ecx

my results, just to show that they are identical. i got a bit parinoid when i saw that i++ used eax and ++j used ecx, so i changed them around, and look:

; 6    : 	++j;	mov	eax, DWORD PTR _j$[ebp] add eax, 1 mov DWORD PTR _j$[ebp], eax; 7    : 	i++;	mov	ecx, DWORD PTR _i$[ebp] add ecx, 1 mov DWORD PTR _i$[ebp], ecx`

now ++j uses eax and i++ uses ecx.

heh i rule .

MENTAL
0

##### Share on other sites
i just look at that url and found out it was for borland compilers.

another reason why (unfortunaly) microsoft rule .

MENTAL
0

##### Share on other sites
MENTAL: you''re wrong, both pre and post increment are defiend in the ANSI C standard. But on a modern optimizing compiler it''s quite possible that they produce the same code.
But in the general case or when in C++ you overload the operators the preincrement (++i) version is faster because it removes a temporary object.

PLZ before stating crap know the fact, and not some bogus facts that an assembly listing from a compiler with all optimizationse enabled are with regard to how the original program was written. Know the standard know the code it should produce if not optimized and then go on ranting.. before that you just make yourself look stupid.
0

##### Share on other sites
A tip could be to not render the tri''s that aren''t visible, but i guess you have code for that already...
0