At last anonymous, another constructive post. Yep about load/store which is crazy since this logic is the same concerning the FPU (since the first Pentium) and the integer unit more recently. Sadly in Visual 6 the intel intrisics like :
mpadd x, *pSomeData
compiles to :
moveq mm1, [pSomeData]
mpadd mm0, mm1
This results in more code, more cycles wasted and unnecessary register consumption (while we're always so limited with 8). So after 10 years of hardcoding I still wonder why enormous corps that make billions of profit (MS, Intel) can't even produce a last pass of compiling that would make asm code reordering/rewritting instead of supposing the processor reordering stage will cover their laziness, which is usually false in practice.
I am not courageous enough to do it. But grr it's not our job.
[edited by - Charles B on November 29, 2003 3:12:46 PM]
Poll - plane equation convention.
ironically my real name is charles. anyway, it''s not difficult to confuse people. It''s being smart enough to teach dumb people that''d really make you stand out. also notice i''ve successfully enticed you with every provoking post malleable as clay
Sure my coder ego is of the same order of magnitude as my skills . But beyond everything it's code truth that counts beyond everything. To me code truth is counted in 1/clock cycles, number of bugs and time required for a someone to expand the code.
But ego is not the point, the point is if you read well, it entitled me to expand the scope of the subject. So thanks for your sneaky constructive approach. Else I also like the Talion rule when someone pisses me off. But sure not everyone is currently concerned in writing a ultra speed portable math lib. And if I do that it's simply because it does not exist or it's not affordable for our team project that will be highly CPU intensive and also scalable/portable. However maybe one day soon I will release the math lib for the community. That's the main reason I did such an innocent poll.
This let's me see :
- first degree : the actual convention people use.
- second degree : see their level of concern for actual math optimizations (any function is small but many rivers ...)
- third degree : catch attention of some coders who would be interested in the same subjects. Specially those working on the PS2.
[edited by - Charles B on November 29, 2003 3:34:56 PM]
But ego is not the point, the point is if you read well, it entitled me to expand the scope of the subject. So thanks for your sneaky constructive approach. Else I also like the Talion rule when someone pisses me off. But sure not everyone is currently concerned in writing a ultra speed portable math lib. And if I do that it's simply because it does not exist or it's not affordable for our team project that will be highly CPU intensive and also scalable/portable. However maybe one day soon I will release the math lib for the community. That's the main reason I did such an innocent poll.
This let's me see :
- first degree : the actual convention people use.
- second degree : see their level of concern for actual math optimizations (any function is small but many rivers ...)
- third degree : catch attention of some coders who would be interested in the same subjects. Specially those working on the PS2.
[edited by - Charles B on November 29, 2003 3:34:56 PM]
Hey I think it's all cool stuff. I gave my reply on what I use, but I guess my reasoning wasn't really even pertinent to your purpose (I guess you're trying to code it in the fastest way possible, whereas I just use the method that makes the most sense to me).
I definitely think that it's important to squeeze as many clock cycles out of a math library as possible with the new technology we have in graphics. I admit I do a lot of stupid stuff in my own math library that most people would scoff at, i.e copying stuff by value which snags you with hidden copy constructors and things like that. I'm only at the point where I want to get stuff to work (I'm only 17, I've only been coding this engine for about half a year, I have taken no special math courses but I am bright and can figure stuff out on my own from various math books, but I've implemented complex collision detection between moving objects while keeping track of where they are in the world based on the BSP tree, and I've developed a very interactive entity hierarchy which is actually turning into a real game with simple 3D animated enemies, comparable to the complexity of GLQuake, with about as many lines of code).
So, does it really make *that* huge of a difference in performance whether it is adding or subtracting the plane's distance? My engine runs wicked fast, upwards of 1700 FPS depending on res and what you are looking at in the world (no joke about that number either, and I don't have the fastest system), so I've only done basic code profiling and I don't worry too much about house keeping stuff, only algorithms and highly complex math.
[edited by - Shadow12345 on November 29, 2003 3:59:42 PM]
[edited by - Shadow12345 on November 29, 2003 4:01:14 PM]
I definitely think that it's important to squeeze as many clock cycles out of a math library as possible with the new technology we have in graphics. I admit I do a lot of stupid stuff in my own math library that most people would scoff at, i.e copying stuff by value which snags you with hidden copy constructors and things like that. I'm only at the point where I want to get stuff to work (I'm only 17, I've only been coding this engine for about half a year, I have taken no special math courses but I am bright and can figure stuff out on my own from various math books, but I've implemented complex collision detection between moving objects while keeping track of where they are in the world based on the BSP tree, and I've developed a very interactive entity hierarchy which is actually turning into a real game with simple 3D animated enemies, comparable to the complexity of GLQuake, with about as many lines of code).
So, does it really make *that* huge of a difference in performance whether it is adding or subtracting the plane's distance? My engine runs wicked fast, upwards of 1700 FPS depending on res and what you are looking at in the world (no joke about that number either, and I don't have the fastest system), so I've only done basic code profiling and I don't worry too much about house keeping stuff, only algorithms and highly complex math.
[edited by - Shadow12345 on November 29, 2003 3:59:42 PM]
[edited by - Shadow12345 on November 29, 2003 4:01:14 PM]
Well then greetings for your (lone efforts). 17 is roughly the age when I started basic 3D. It was also more difficult in one sense since it was all software. But managing coldet and BSP trees at the age of 17 is a cool start. So my excuses for my arsh answers. But finally this leads to constructive exchanges.
Well if you render Quake scenes, first the poly-count is of the old gen 3D engines so the ammount of CPU processing on today hardware is not that big. Todays state of art technologies should render 30K 50K 100K or more polys after any CLOD and culling (occlusion too) has been done.
In a first order approach you are right that it's wise to think most is processed by the 3D hardware. However CLOD requires CPU work and many other techniques too. For instance my terrain engine is infinite in precision and distance. To do this I need to feed the GPU with a lot of data updated per frame or at a lower rate and that's computed on the CPU. You mentionned physics. Todays games are far from being optimal in the way they implement physics. Imagine a game where any object that could be animated in real life could also be animated in the virtual scene. For instance you launch a rocket into a wall (made of individual bricks when you close up) and the bricks explode and may hit the player (col det).
So as you can see, if you want to use the GPU capacity at max then the CPU also becomes a serious bottleneck for most ambitious new challenges. Personnally I for instance generate new LODs of textures on the fly with all lighting self shadowing, etc ... You can see details of one millimeter. The GPU even with pixel shaders would be strangled because I use 16 passes of shadowing for each light to simulate penumbra. On the CPU I can devide the refresh rate by 100 and update once per second in the texture space, hardware blend between transitions. This is just one example.
Now for the math lib more specifically. Look at your most used functions found with your profiler. They are many certainly apart one or two main loops. But every where you will find basic maths. Sqrt, RSqrt, dot/cross products, quaternions, etc ... OK now here is something I tell you not by vanity but just because I was surprised myself in one of the fields I consider I am mastering. There is an incredible difference between optimal C/C++ code relying on asm or intrisics (let's call it OPTI) and standard OOP math classes (I call it STD). The factor is (read well) :
- between 10 and 100 in STD debug mode
- between 2 and 20 in STD release mode
- OPTI is at worst only 50% slower than the most scheduled handwritten assembly code (twenty years of asm coding exp in my bag.) and it sometimes even beats it because my C macros/functions lets the compiler reorder code in big functions which would be impossible using macros in asm coding for instance.
So as you can see there is a lot of room for improvement in common math libs.
Now here is my conclusion. I always tend to resist to the fashions of the day in computer sciences. Such simplistic assertions made by "software engineers" who think for instance that compilers are so performant that they do everything for you once you code decently in C++. That's false I always see layers in my software architecture. In the lower layers I still keep a fair ammount of crazy hacks. But all in all my code compiles on a Mac, PC or Unix seamlessly even when I code on Windows during one week. Now today the illusion is that one can do everything with (vertex/pixel) shaders. Yep everything in a demo. But you won't fill my infinite scenes with shaders only. You need algorithms. Remember that the CPU is where still lays the most freedom and creativity potential because it's a generic processor, that's where you design the highest level and thus the most crucial optimizations. For instance it used to be the BSP trees. Now ABT or Quad Trees.
So remember this : the GPU is the lair of brute force. The CPU is the lair of subtle force, algorithms. Thus the real potential of CPU may in fact still surpass totally the equivalent GPU implementations. Thus code level optimizations on the CPU, the purpose of my math lib, must be overweighted by a huge factor, 100 in my terrain textures example.
When this assertion will become false when the boundary between the GPU and CPU will be very unclear. It is already the case in the very interesting but complex architecture of the PS2. On the PS2 you need to handle the CPU AND GPU very precisely, not only the GPU. On a PC it's also what one should try to do.
(I read again, sorry it's late, and my english sucks, I hope you understood what I meant.)
[edited by - Charles B on November 29, 2003 9:31:26 PM]
Well if you render Quake scenes, first the poly-count is of the old gen 3D engines so the ammount of CPU processing on today hardware is not that big. Todays state of art technologies should render 30K 50K 100K or more polys after any CLOD and culling (occlusion too) has been done.
In a first order approach you are right that it's wise to think most is processed by the 3D hardware. However CLOD requires CPU work and many other techniques too. For instance my terrain engine is infinite in precision and distance. To do this I need to feed the GPU with a lot of data updated per frame or at a lower rate and that's computed on the CPU. You mentionned physics. Todays games are far from being optimal in the way they implement physics. Imagine a game where any object that could be animated in real life could also be animated in the virtual scene. For instance you launch a rocket into a wall (made of individual bricks when you close up) and the bricks explode and may hit the player (col det).
So as you can see, if you want to use the GPU capacity at max then the CPU also becomes a serious bottleneck for most ambitious new challenges. Personnally I for instance generate new LODs of textures on the fly with all lighting self shadowing, etc ... You can see details of one millimeter. The GPU even with pixel shaders would be strangled because I use 16 passes of shadowing for each light to simulate penumbra. On the CPU I can devide the refresh rate by 100 and update once per second in the texture space, hardware blend between transitions. This is just one example.
Now for the math lib more specifically. Look at your most used functions found with your profiler. They are many certainly apart one or two main loops. But every where you will find basic maths. Sqrt, RSqrt, dot/cross products, quaternions, etc ... OK now here is something I tell you not by vanity but just because I was surprised myself in one of the fields I consider I am mastering. There is an incredible difference between optimal C/C++ code relying on asm or intrisics (let's call it OPTI) and standard OOP math classes (I call it STD). The factor is (read well) :
- between 10 and 100 in STD debug mode
- between 2 and 20 in STD release mode
- OPTI is at worst only 50% slower than the most scheduled handwritten assembly code (twenty years of asm coding exp in my bag.) and it sometimes even beats it because my C macros/functions lets the compiler reorder code in big functions which would be impossible using macros in asm coding for instance.
So as you can see there is a lot of room for improvement in common math libs.
Now here is my conclusion. I always tend to resist to the fashions of the day in computer sciences. Such simplistic assertions made by "software engineers" who think for instance that compilers are so performant that they do everything for you once you code decently in C++. That's false I always see layers in my software architecture. In the lower layers I still keep a fair ammount of crazy hacks. But all in all my code compiles on a Mac, PC or Unix seamlessly even when I code on Windows during one week. Now today the illusion is that one can do everything with (vertex/pixel) shaders. Yep everything in a demo. But you won't fill my infinite scenes with shaders only. You need algorithms. Remember that the CPU is where still lays the most freedom and creativity potential because it's a generic processor, that's where you design the highest level and thus the most crucial optimizations. For instance it used to be the BSP trees. Now ABT or Quad Trees.
So remember this : the GPU is the lair of brute force. The CPU is the lair of subtle force, algorithms. Thus the real potential of CPU may in fact still surpass totally the equivalent GPU implementations. Thus code level optimizations on the CPU, the purpose of my math lib, must be overweighted by a huge factor, 100 in my terrain textures example.
When this assertion will become false when the boundary between the GPU and CPU will be very unclear. It is already the case in the very interesting but complex architecture of the PS2. On the PS2 you need to handle the CPU AND GPU very precisely, not only the GPU. On a PC it's also what one should try to do.
(I read again, sorry it's late, and my english sucks, I hope you understood what I meant.)
[edited by - Charles B on November 29, 2003 9:31:26 PM]
everything you said makes sense (I did have a little bit of trouble understanding what you meant, so I read what you wrote twice), and I'm not at the point where I have to worry about things like per pixel lighting and stencil shadows, so I have plenty of processing power to do what I want. It's difficult enough to actually write a game engine, and there are no tutorials out there that say 'this is how you write a game engine'. you mentioned rendering in software. I was wondering, how many developers (aside from device driver writers who implemented OpenGL and direct3D) actually know how to write a software renderer? I know the basics of casting a ray out into the world to intersect objects, finding where it intersects the view plane, and interpolating across the polygon to visit each pixel across each scanline applying the texture/color, but I haven't actually written a software renderer and I don't know if it is a good idea to spend the time to do so. Would that be somethign that I should take the time to learn and implement? I'm wary of going into extrememly difficult things such as that because I'm afraid it will take too long, and I could be heavily focusing on my BSP game engine (which renders with OpenGL only right now).
Also, are you actually going through and implementing square root? That'd be very impressive. I've looked at carmack's INVSqrt function but I don't understand it, and I've tried writing my own several times, once was a 'guessing' program that used algorithms to guess the sqrt and continued until it got within a certain value, with a certain percent accuracy, and the second time I tried doing it using derivatives only to find that in order to find the derivative of a sqrt, you need to call sqrt (the derivative of sqrt(x) is 1/(2*sqrt(x)), so that didn't work).
Yeah, I definitely do not believe that, but at the same time I've tried finding the 'perfect tweaks' for the compiler and it seems next to impossible without trying to dissect things and re-write everything in asm (which is what you are doing with the mathlib).
I've got a question because you code on PS2: exactly what is under the hood of a PS2 and who writes drivers for its CPU and GPU and fabricates the hardware? For example, I visited SGI this summer (Yes, for real, I have pictures!) and they have their own custom super computers with special architecture (they have a low frequency, about 700MHz per processor, but the actual computing power is that of a 2.2GHz P4 easy, plus they have upwards of 64 processors in one super computer, I think they made it low frequency to be stable), however SGI doesn't make their own processors, they only come up with the designs and then IBM actually produces the processors themselves. I'm wondering if the PS2 is just a pentium 3 for a CPU like the XBox, and with a special integrated GPU of some sort mended with already existing technology and software, or if the hardware and software is a custom design just like SGI's systems.
I've got a lot more to say/ask, but I think that's enough for right now.
oh, also, I can send the code for my project to you sometime if you want.
[edited by - Shadow12345 on November 29, 2003 10:00:40 PM]
[edited by - Shadow12345 on November 29, 2003 10:02:26 PM]
Also, are you actually going through and implementing square root? That'd be very impressive. I've looked at carmack's INVSqrt function but I don't understand it, and I've tried writing my own several times, once was a 'guessing' program that used algorithms to guess the sqrt and continued until it got within a certain value, with a certain percent accuracy, and the second time I tried doing it using derivatives only to find that in order to find the derivative of a sqrt, you need to call sqrt (the derivative of sqrt(x) is 1/(2*sqrt(x)), so that didn't work).
quote:
Such simplistic assertions made by "software engineers" who think for instance that compilers are so performant that they do everything for you once you code decently in C++.
Yeah, I definitely do not believe that, but at the same time I've tried finding the 'perfect tweaks' for the compiler and it seems next to impossible without trying to dissect things and re-write everything in asm (which is what you are doing with the mathlib).
I've got a question because you code on PS2: exactly what is under the hood of a PS2 and who writes drivers for its CPU and GPU and fabricates the hardware? For example, I visited SGI this summer (Yes, for real, I have pictures!) and they have their own custom super computers with special architecture (they have a low frequency, about 700MHz per processor, but the actual computing power is that of a 2.2GHz P4 easy, plus they have upwards of 64 processors in one super computer, I think they made it low frequency to be stable), however SGI doesn't make their own processors, they only come up with the designs and then IBM actually produces the processors themselves. I'm wondering if the PS2 is just a pentium 3 for a CPU like the XBox, and with a special integrated GPU of some sort mended with already existing technology and software, or if the hardware and software is a custom design just like SGI's systems.
I've got a lot more to say/ask, but I think that's enough for right now.
oh, also, I can send the code for my project to you sometime if you want.
[edited by - Shadow12345 on November 29, 2003 10:00:40 PM]
[edited by - Shadow12345 on November 29, 2003 10:02:26 PM]
quote:
I''ve got a question because you code on PS2: exactly what is under the hood of a PS2 and who writes drivers for its CPU and GPU and fabricates the hardware? For example, I visited SGI this summer (Yes, for real, I have pictures!) and they have their own custom super computers with special architecture (they have a low frequency, about 700MHz per processor, but the actual computing power is that of a 2.2GHz P4 easy, plus they have upwards of 64 processors in one super computer, I think they made it low frequency to be stable), however SGI doesn''t make their own processors, they only come up with the designs and then IBM actually produces the processors themselves. I''m wondering if the PS2 is just a pentium 3 for a CPU like the XBox, and with a special integrated GPU of some sort mended with already existing technology and software, or if the hardware and software is a custom design just like SGI''s systems.
Basically in the PS2 you have a slow and cacheless CPU, two vector units (basically one (VU0) for doing stuff in collaboration with the CPU, and one (VU1) for transforming vertex streams and driving the graphic chip) and then the GS, the graphics ship which is the nice part of the whole
And a fast DMA controller to transfer data all around the place.
The GS has good brute force potentiality and very good fillrate IMO, but the fact you have to code so much on the VU1 is a real pain in the ass, as the lack of cache. Parrallelization makes it hard to develop and optimize, but PCs begin to have the same problem of balancing the workload with modern GPUs now...
For the sake of developping a math lib using VU0, you are at the mercy of the compiler allocating SIMD registers and optimising copies, so there''s not much to do if it doesn''t (like CodeWarrior, SN Systems''s one is much better but it seems (I hope since I work with CW) Metroweks will catch up)...
BTW, Charles, I see you''re French, at what game company are you working?
@Tramboi
I don''t think there are many valuable companies in France after the crash, tho there are many skilled guys (like Yann L for instance). Thus I currently work with my own funds (to survive) and I have started an international team to make a technically ambitious online game. (Check my profile if you wanna get the link to Small Big Game For Real Coders). Now concerning my past experience I was just fed up beeing "exploited" by some carpet sellers who know nothing about game dev and ruined several times my attempts to do something worth being released. You probably know how uncomfortable game dev may be anywhere, in France in was just hell.
@Shadow
You misunderstood me on two points :
- I don''t make a software renderer today though I have coded many in the past. I simply use the CPU for some things like generating textures (or light maps if you want) on the fly. It''s not quite the same.
- I am never coded on a PS2 tho I regret not knowing this interesting machine a bit more coz I am somehow a masochist coder. I am just interested to have to feed back from console coders to see if there could be a way to make a totally cross platform math lib. Currently it''s MacOSX (Altivec) and PC, Unix, Windows (MMX/3DNow/SSE/SSE2) compatible.
About RSqrt and Sqrt I simply cut/pasted "Carmack" square root code for the FPU version. I also use the fast RSQRT of 3DNow. All in all it''s just 0.1% of the math code base I am not sure Carmack created this code. Such things are known since decades. What counts is implementations hacks depending on a certain generation of hardware. You also have to read about the IEEE 754 floating point format. Else it''s based on serie development (but I doubt you''ll learn about it before you are 19 or 20 depending on your studies). However it''s not so complex to understand, yes it''s based on derivatives.
One of the secrets of the "Carmack" code is this :
In general (xe)'' = e*xe-1
RSqrt(x) = x-0.5
So :
(1) RSqrt(x+dx) = RSqrt(x) -0.5*x-1.5*dx + o(dx*dx).
So when dx is small the second order o(dx*dx) is negligeable. There are some math properties with RSqrt which make this equation very useful. But that''s too complex to explain here.
This equation is the base of what''s called the Newton-Raphson refinement algorithm. You wanna compute y=RSqrt(x). Once you know a rather good approximation (*), I call it y''=RSqrt(x''), where y'' is close to y and thus x'' close to x, dx=x''-x you can exploit equation (1) :
And if you work a little bit with (1) you will find the explanation for this "magic" line (2) :
y*=(1.5f-0.5f*x*y*y);
(*) study the code of Carmack and try to find what this approximation is.
i = magic -( i>>1); // based on the IEEE 754 format, exponent and mantissa.
Else Sqrt(x)=x*RSqrt(x).
BTW (2) can be very useful to compute speedy exact point light contributions on scan lines (on a lightmap row for instance) because when you do u+du the light vector length varies continuously. Not RSQrt required just (2).
I don''t think there are many valuable companies in France after the crash, tho there are many skilled guys (like Yann L for instance). Thus I currently work with my own funds (to survive) and I have started an international team to make a technically ambitious online game. (Check my profile if you wanna get the link to Small Big Game For Real Coders). Now concerning my past experience I was just fed up beeing "exploited" by some carpet sellers who know nothing about game dev and ruined several times my attempts to do something worth being released. You probably know how uncomfortable game dev may be anywhere, in France in was just hell.
@Shadow
You misunderstood me on two points :
- I don''t make a software renderer today though I have coded many in the past. I simply use the CPU for some things like generating textures (or light maps if you want) on the fly. It''s not quite the same.
- I am never coded on a PS2 tho I regret not knowing this interesting machine a bit more coz I am somehow a masochist coder. I am just interested to have to feed back from console coders to see if there could be a way to make a totally cross platform math lib. Currently it''s MacOSX (Altivec) and PC, Unix, Windows (MMX/3DNow/SSE/SSE2) compatible.
About RSqrt and Sqrt I simply cut/pasted "Carmack" square root code for the FPU version. I also use the fast RSQRT of 3DNow. All in all it''s just 0.1% of the math code base I am not sure Carmack created this code. Such things are known since decades. What counts is implementations hacks depending on a certain generation of hardware. You also have to read about the IEEE 754 floating point format. Else it''s based on serie development (but I doubt you''ll learn about it before you are 19 or 20 depending on your studies). However it''s not so complex to understand, yes it''s based on derivatives.
One of the secrets of the "Carmack" code is this :
In general (xe)'' = e*xe-1
RSqrt(x) = x-0.5
So :
(1) RSqrt(x+dx) = RSqrt(x) -0.5*x-1.5*dx + o(dx*dx).
So when dx is small the second order o(dx*dx) is negligeable. There are some math properties with RSqrt which make this equation very useful. But that''s too complex to explain here.
This equation is the base of what''s called the Newton-Raphson refinement algorithm. You wanna compute y=RSqrt(x). Once you know a rather good approximation (*), I call it y''=RSqrt(x''), where y'' is close to y and thus x'' close to x, dx=x''-x you can exploit equation (1) :
And if you work a little bit with (1) you will find the explanation for this "magic" line (2) :
y*=(1.5f-0.5f*x*y*y);
(*) study the code of Carmack and try to find what this approximation is.
i = magic -( i>>1); // based on the IEEE 754 format, exponent and mantissa.
Else Sqrt(x)=x*RSqrt(x).
BTW (2) can be very useful to compute speedy exact point light contributions on scan lines (on a lightmap row for instance) because when you do u+du the light vector length varies continuously. Not RSQrt required just (2).
Personally, I use the formula:
Ax + By + Cz + Dw = 0;
that way, when I do Dot(Plane,Point) I get the distance from the plane to the point(knowing my planes w = -dot(Plane, Point) and my points w = 1)
Because of this I have to use 4D dot products, I find this works perfectly as in the case Plane . Point the ws multiply to add d and in everything else(Vector . Point, Vector . Plane, Vector . Vector, Quaternion . Vector(untested)) at least 1 w = 0 and thus both ws are ignored
I assume this 4D stuff helps with SIMD as SSE is 4xfloat but im too lazy code in asm and when the time comes I''ll proberbly just get ICC or something and optimize it for SSE1 as everyone''ll have it by then :-D
for arguements sake I store vectors, quaternions, points and planes in the same variable, it works rather well as most of them have common functions(eg. dot products) and because of that, I code like theres no tomorrow(ie. I do all my debugging immediately after making a function, then never touch it again, only ever replace it entirely)
so I guess you could say I do "(1) Plane.N * P + Plane.d;" however the + Plane.d is hidden in the dot product as Plane.d*1
Ax + By + Cz + Dw = 0;
that way, when I do Dot(Plane,Point) I get the distance from the plane to the point(knowing my planes w = -dot(Plane, Point) and my points w = 1)
Because of this I have to use 4D dot products, I find this works perfectly as in the case Plane . Point the ws multiply to add d and in everything else(Vector . Point, Vector . Plane, Vector . Vector, Quaternion . Vector(untested)) at least 1 w = 0 and thus both ws are ignored
I assume this 4D stuff helps with SIMD as SSE is 4xfloat but im too lazy code in asm and when the time comes I''ll proberbly just get ICC or something and optimize it for SSE1 as everyone''ll have it by then :-D
for arguements sake I store vectors, quaternions, points and planes in the same variable, it works rather well as most of them have common functions(eg. dot products) and because of that, I code like theres no tomorrow(ie. I do all my debugging immediately after making a function, then never touch it again, only ever replace it entirely)
so I guess you could say I do "(1) Plane.N * P + Plane.d;" however the + Plane.d is hidden in the dot product as Plane.d*1
Fine dreddlox so if we decide to release our lib as GPL you''ll be able to map your own routines to our perf macros/funcs or search/replace in your code to update it. You''ll benefit not only of the SSE but also 3DNow optimizations and even some FPU code (all scheduled C or asm).
For quaternions (q) I don''t have the simplified fomula in mind but It''s more complex than a dot product. I am sure there are some cross products in it since for a vector v it''s v'' = q*v*q-1.
I have only implemented the most common functions till now. I am almost sure it''s quicker to convert quat->mat43 (transposed submatrix) and then compute matrix * vector in SIMD (specially SSE) once you have several vectors to rotate/translate.
We wonder if ppl here would get interested by our very high perf and portable math lib. For instance C dot product in 2 clock cycles and highly scheduled vertex array functions (best possible parallelism, unrolling, most hand written). Since we make a lot of efforts, and many benefits later only considering our own devs, currently there could be two strategies for us :
(1) A "light version" would be free for indies with small royalties in case their products become commercial. What do you think about it ?
(2) A full GPL version. Our benefits would mainly be having users expand the code base (win-win cooperative logic). Eventually some clock cycle competitions would replace old asm or C code by more optimal code (tho I doubt many can beat me ).
I''ll probably open a new thread about it. Would some of you be interested in (1) or (2) ? Any comments ?
For quaternions (q) I don''t have the simplified fomula in mind but It''s more complex than a dot product. I am sure there are some cross products in it since for a vector v it''s v'' = q*v*q-1.
I have only implemented the most common functions till now. I am almost sure it''s quicker to convert quat->mat43 (transposed submatrix) and then compute matrix * vector in SIMD (specially SSE) once you have several vectors to rotate/translate.
We wonder if ppl here would get interested by our very high perf and portable math lib. For instance C dot product in 2 clock cycles and highly scheduled vertex array functions (best possible parallelism, unrolling, most hand written). Since we make a lot of efforts, and many benefits later only considering our own devs, currently there could be two strategies for us :
(1) A "light version" would be free for indies with small royalties in case their products become commercial. What do you think about it ?
(2) A full GPL version. Our benefits would mainly be having users expand the code base (win-win cooperative logic). Eventually some clock cycle competitions would replace old asm or C code by more optimal code (tho I doubt many can beat me ).
I''ll probably open a new thread about it. Would some of you be interested in (1) or (2) ? Any comments ?
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement