Jump to content
  • Advertisement

Archived

This topic is now archived and is closed to further replies.

Mr_Burns

math in quake

This topic is 5890 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I was browsing the quake 1 and 2 source code, and I noticed that Carmack uses macros for all his vector math. I figure he does everything as optimally as possible, so are macros really any faster than using inline overloaded operators? I would think that in some cases, e.g. if one of the parameters is an expression, macros would be slower because the expression is evaluated more than once when the macro is expanded. I also noticed that whenever he divides a vector by a scalar, he actually multiplies by 1/scaler. On today''s modern processors, is dividing so slow that doing 1 divide and 3 mults is faster than 3 divides? Or does this optimization only apply to the older processors of quake 2''s day? TIA for the input

Share this post


Link to post
Share on other sites
Advertisement
Intersesting.

I do think that 1 div and 3 muls is faster than 3 divs.

I don''t know about macro/inline speed tradeoffs.




Bugle4d

Share this post


Link to post
Share on other sites
1 div and 3 muls are definatly somewhat faster then 3 divs. you see divs are just like muls. repetetive use of subtraction. unfortunatly, it requires doing it many times more then the divsor and a conditional check to see when you are done. also you cant optimize a division well. plus you would always mulitple by the smaller number to reduce the number of additions.

see http://fatphil.org/x86/pentopt/17.html
for some info. the most improtant tidbit is that
quote:

Integer multiplication takes 4 clocks, floating point multiplication 5, and MMX multiplication 3 clocks. Integer and MMX multiplication is pipelined so that it can receive a new instruction every clock cycle. Floating point multiplication is partially pipelined: The execution unit can receive a new FMUL instruction two clocks after the preceding one, so that the maximum throughput is one FMUL per two clock cycles. The holes between the FMUL''s cannot be filled by integer multiplications because they use the same circuitry. XMM additions and multiplications take 3 and 4 clocks respectively, and are fully pipelined. But since each logical XMM register is implemented as two physical 64-bit registers, you need two uops for a packed XMM operation, and the throughput will then be one arithmetic XMM instruction every two clock cycles. XMM add and multiply instructions can execute in parallel because they don''t use the same execution port.

Integer and floating point division takes up to 39 clocks and is not pipelined. This means that the execution unit cannot begin a new division until the previous division is finished. The same applies to squareroot and transcendental functions.




this applies for ppro, p2, and p3 (though naturally mmx is p2/p3 and xmm is for p3).

as for macros, carmack is hoping that the compiler will place the result of the expression in a register. most decent optimizers will do this for many expressions by calculating it then pushing the answer on the stack till needed. again theory, see asm output from your compiler to see what really happens.

Share this post


Link to post
Share on other sites
Nowadays inline functions have made macros for math functions relatively useless. Also, remember the well-known macro-killing expression:
f(++x);

Share this post


Link to post
Share on other sites
Well, technically you can''t divide a vector by a scalar. It is undefined. I would be inclined to say that is why he does it more than optimization.

A core rule of optimization is that 10% of your code does 90% of your processing. That is in an optimized program. In an unoptimized one it it might be 1% doing 99%. What you do in the other 90% of the code doesn''t much matter. It isn''t 1 divide and 3 multiplies versus 3 divides that matters. Rather it is a billion divides and 3 billion multiplies versus 3 billion divides that matters. Even then that is just a petty issue. You can optimize every individual statement and still end up with performance that sucks. It is like hiring a programmer based on his typing speed. Well, he has to type the program in doesn''t he?

The micro level of optimization will get you like 10 to 20%. The macro level will get you 90, 95, 99% or perhaps even better. There is a time for that 10 to 20%. 10% can double your frame rate if you are just missing your vertical blanking. If you are running like crap on a top of the line machine 10 to 20% isn''t going to cut it. You most likely need 50% just to say it is servicable on the high end. To cover the mass market you are going to have to get a 90% or better improvement. As a general rule if performance is unacceptable then that means I need 10 to 1. There are things at the micro level that will get you that. Generally though 50% is pretty much the top end. Your user will notice an improvement of 50%. It would be foolish to pass up 50% or even 10%. That is unless getting that 50% causes you to miss the 90% you could have gotten. You don''t have an unlimited amount of time. The most highly optimized bubble sort of a million records still sucks and the most optimized sort of a million records that don''t need sorted is still just a waste.

Share this post


Link to post
Share on other sites
I could be wrong, but in the sense that your scalar could be a vector, i.e. a vector of vectors. You could not take a reciprocal of your scalar then because division by a vector is undefined. As far as I know scalars are not limited to real and complex numbers. Rather it is whatever is considered one element of your vector. It could be a vector, a matrix, a function, or anything for which the properties concerning scalars still hold. So division is only defined if it is defined for whatever you defined as your scalar.

Share this post


Link to post
Share on other sites
The primary definition of a scalar is that of a value, something with neither magnitude or direction. If you define scalar as the Game Dictionary does, it is a single element, which could conceivably be of any number of dimensions. However, in the above statement, I assumed the writer was referring to a vector as a 3D vector, in which case a scalar would be a single value, a single number. So division by such a number would not be undefined.

Later,
ZE.

//email me.//zealouselixir software.//msdn.//n00biez.//
miscellaneous links

[if you have a link proposal, email me.]

Share this post


Link to post
Share on other sites
Yes, true, but still most mathematics and physics textbooks will use (1/c)*u instead of u/c. That is the convention. I personally wouldn''t override the / operator for a vector. I would want to get a compile error if I tried to divide a vector by a scalar since the main reason I would attempt that is mistyping a formula. Oh yeah, that was suppose to be the magnitude of u, not u.

Share this post


Link to post
Share on other sites

  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!