#### Archived

This topic is now archived and is closed to further replies.

# Matrix class slower than GL?

This topic is 5100 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I''m implementing a simple matrix class that would support all of the basic transformations that OpenGL does, plus shearing. After typing part of it up, I realized that it might be a lot slower than OpenGL''s... But without fully finishing it, I cna''t really test it. So my question is: does OpenGL accelerate the matrix constructions at all? I''d still use the GL multiplication functions, I would just make my own matrices.

##### Share on other sites
OpenGL may be able to access hardware, so it is quite possible OGL is faster. However, if your class has many more features, then it can be much better regardless of speed differences.

##### Share on other sites
If you have a graphics card with hardware T&L(geforce or greater), your implementation WILL be slower than OGL.

##### Share on other sites
If you''re only multiplying a few matrices per frame then I wouldn''t worry about it. If you''re multiplying hundreds, and you need to do it in software, consider using SSE instructions.

##### Share on other sites
Hmmm.... Well I guess I''ll keep my class and finish it, and figure out a way to get OpenGL to do the transforms for me (I''m sure I can read the matrices back into my matrix). Yeah, I''ll just have to use the model view matrix during construction to do my sets of operations...

##### Share on other sites
Don''t read back the matrices. Reading back from the gfx card should be avoided as far as possible. This is something which Yann used to emphasize as well, which is why he did OC in software.

Most cards with T&L will be able to do transformation of vertices on hardware, i believe the glRotatef/glTranslatef are done by the driver(Im not 100% sure) on the CPU(very optimised likely). However, the vertices are transformed by hardware.

So if you have your own matrix, you should use glLoadMatrix/glMultMatrix or something like that to pass your matrix to the card. The transformation for the vertices should still be done in hardware.

##### Share on other sites
quote:
Original post by GamerSg
Don''t read back the matrices. Reading back from the gfx card should be avoided as far as possible.

I''d be very surprised if the drivers didn''t store the matrix transformations in system memory as well as on the graphics card, so reading them back isn''t going to be slow. That said, don''t take my word for it, profile it.

(AFAIK, when you call things like glMultMatrix the actual multiplication is done on the CPU and only the final result sent to the graphics card. Its the actual vertex transformation which benifits from the card doing T&L).

##### Share on other sites
Well I''m going to need them stored somehow, because I need to give each node of my scene graph class a transformation matrix. My matrix class can quickly load translations, scalings, and shears, but the multiplication and rotation aspects will definitely be slow. So now I''m left in a quandry, read it back, or try to do it all in software...

##### Share on other sites
as someone said, figure out if matrix multiplications are even done in hardware by opengl (draw say 3000 simple rotating objects and do the math yourself in one test and let it be done by opengl in another test.. or dont even draw them, just stop the time the multiplication takes.. if opengl doing them is much slower you can assume its done in hardware and reading them back IS slow).

or profile the call to getfloat alone and check how long it takes.

edit: just did that and get results between 1.5-5us just for getfloat. doesnt sound like much but the whole frustum culling for my terrain takes 3-5us. a 3d dotproduct btw. takes .2us while doing a whole rotation (push, load, rotate, get, pop) is between 3-10us.

so you can do 15-50 dotproducts in software in the same time it takes doing it in opengl. so the only interesting part is: how long will it take in software when you need sin, cos and from time to time a few sqrts and cross products.

[edited by - Trienco on March 7, 2004 9:16:37 AM]

##### Share on other sites
quote:
Original post by Trienco
as someone said, figure out if matrix multiplications are even done in hardware by opengl (draw say 3000 simple rotating objects and do the math yourself in one test and let it be done by opengl in another test.. or dont even draw them, just stop the time the multiplication takes.. if opengl doing them is much slower you can assume its done in hardware and reading them back IS slow).

or profile the call to getfloat alone and check how long it takes.

edit: just did that and get results between 1.5-5us just for getfloat. doesnt sound like much but the whole frustum culling for my terrain takes 3-5us. a 3d dotproduct btw. takes .2us while doing a whole rotation (push, load, rotate, get, pop) is between 3-10us.

so you can do 15-50 dotproducts in software in the same time it takes doing it in opengl. so the only interesting part is: how long will it take in software when you need sin, cos and from time to time a few sqrts and cross products.

That was really really helpful... I think I''ll profile the sine''s now... I would appreciate it though if you tests the source I post on your computer as two different ones don''t give good comparative results.

##### Share on other sites
Here is the source:
#include <iostream>#include <stdlib.h>#include <windows.h>#include <math.h>int main(){    using namespace std;    int time, dt;    float f;    float a = 31.142;    time = GetTickCount();    for (register int i = 0; i < 10000000; ++ i)    {    }    dt = GetTickCount() - time;    cout << "None: " << (float) dt / 10000 << endl;    time = GetTickCount();    for (register int i = 0; i < 10000000; ++ i)    {        f = sin(a);    }    dt = GetTickCount() - time;    cout << "Sin: " << (float) dt / 10000 << endl;    time = GetTickCount();    for (register int i = 0; i < 10000000; ++ i)    {        f = cos(a);    }    dt = GetTickCount() - time;    cout << "Cos: " << (float) dt / 10000 << endl;    system("pause");}

clicky

##### Share on other sites
that wont work. you should use optimized builds to see how fast it can get and that means the compiler will just throw away the loop (after all, whats the point in 10 million sin if the result is never used? and if it is, why not just calculate it once?)..

also: afaik gettickcount has less than ms precision and some overhead. with 10million calculations that probably wouldnt matter, but optimizations might make the result worthless (and for example sqrt seems to perform a lot better in optimized builds.. even fast enough so i couldnt find a decent replacement, as even interpolated table lookups weren''t really faster).