math in quake

Started by
17 comments, last by Mr_Burns 21 years, 11 months ago
I was browsing the quake 1 and 2 source code, and I noticed that Carmack uses macros for all his vector math. I figure he does everything as optimally as possible, so are macros really any faster than using inline overloaded operators? I would think that in some cases, e.g. if one of the parameters is an expression, macros would be slower because the expression is evaluated more than once when the macro is expanded. I also noticed that whenever he divides a vector by a scalar, he actually multiplies by 1/scaler. On today''s modern processors, is dividing so slow that doing 1 divide and 3 mults is faster than 3 divides? Or does this optimization only apply to the older processors of quake 2''s day? TIA for the input
Advertisement
Intersesting.

I do think that 1 div and 3 muls is faster than 3 divs.

I don''t know about macro/inline speed tradeoffs.




Bugle4d
~V'lionBugle4d
1 div and 3 muls are definatly somewhat faster then 3 divs. you see divs are just like muls. repetetive use of subtraction. unfortunatly, it requires doing it many times more then the divsor and a conditional check to see when you are done. also you cant optimize a division well. plus you would always mulitple by the smaller number to reduce the number of additions.

see http://fatphil.org/x86/pentopt/17.html
for some info. the most improtant tidbit is that
quote:
Integer multiplication takes 4 clocks, floating point multiplication 5, and MMX multiplication 3 clocks. Integer and MMX multiplication is pipelined so that it can receive a new instruction every clock cycle. Floating point multiplication is partially pipelined: The execution unit can receive a new FMUL instruction two clocks after the preceding one, so that the maximum throughput is one FMUL per two clock cycles. The holes between the FMUL''s cannot be filled by integer multiplications because they use the same circuitry. XMM additions and multiplications take 3 and 4 clocks respectively, and are fully pipelined. But since each logical XMM register is implemented as two physical 64-bit registers, you need two uops for a packed XMM operation, and the throughput will then be one arithmetic XMM instruction every two clock cycles. XMM add and multiply instructions can execute in parallel because they don''t use the same execution port.

Integer and floating point division takes up to 39 clocks and is not pipelined. This means that the execution unit cannot begin a new division until the previous division is finished. The same applies to squareroot and transcendental functions.



this applies for ppro, p2, and p3 (though naturally mmx is p2/p3 and xmm is for p3).

as for macros, carmack is hoping that the compiler will place the result of the expression in a register. most decent optimizers will do this for many expressions by calculating it then pushing the answer on the stack till needed. again theory, see asm output from your compiler to see what really happens.
Nowadays inline functions have made macros for math functions relatively useless. Also, remember the well-known macro-killing expression:
f(++x);
Well, technically you can''t divide a vector by a scalar. It is undefined. I would be inclined to say that is why he does it more than optimization.

A core rule of optimization is that 10% of your code does 90% of your processing. That is in an optimized program. In an unoptimized one it it might be 1% doing 99%. What you do in the other 90% of the code doesn''t much matter. It isn''t 1 divide and 3 multiplies versus 3 divides that matters. Rather it is a billion divides and 3 billion multiplies versus 3 billion divides that matters. Even then that is just a petty issue. You can optimize every individual statement and still end up with performance that sucks. It is like hiring a programmer based on his typing speed. Well, he has to type the program in doesn''t he?

The micro level of optimization will get you like 10 to 20%. The macro level will get you 90, 95, 99% or perhaps even better. There is a time for that 10 to 20%. 10% can double your frame rate if you are just missing your vertical blanking. If you are running like crap on a top of the line machine 10 to 20% isn''t going to cut it. You most likely need 50% just to say it is servicable on the high end. To cover the mass market you are going to have to get a 90% or better improvement. As a general rule if performance is unacceptable then that means I need 10 to 1. There are things at the micro level that will get you that. Generally though 50% is pretty much the top end. Your user will notice an improvement of 50%. It would be foolish to pass up 50% or even 10%. That is unless getting that 50% causes you to miss the 90% you could have gotten. You don''t have an unlimited amount of time. The most highly optimized bubble sort of a million records still sucks and the most optimized sort of a million records that don''t need sorted is still just a waste.
Keys to success: Ability, ambition and opportunity.
quote:Original post by LilBudyWizer
Well, technically you can''t divide a vector by a scalar. It is undefined. I would be inclined to say that is why he does it more than optimization.


Undefined? By what line of reasoning?



//email me.//zealouselixir software.//msdn.//n00biez.//
miscellaneous links

[if you have a link proposal, email me.]

[twitter]warrenm[/twitter]

I could be wrong, but in the sense that your scalar could be a vector, i.e. a vector of vectors. You could not take a reciprocal of your scalar then because division by a vector is undefined. As far as I know scalars are not limited to real and complex numbers. Rather it is whatever is considered one element of your vector. It could be a vector, a matrix, a function, or anything for which the properties concerning scalars still hold. So division is only defined if it is defined for whatever you defined as your scalar.
Keys to success: Ability, ambition and opportunity.
The primary definition of a scalar is that of a value, something with neither magnitude or direction. If you define scalar as the Game Dictionary does, it is a single element, which could conceivably be of any number of dimensions. However, in the above statement, I assumed the writer was referring to a vector as a 3D vector, in which case a scalar would be a single value, a single number. So division by such a number would not be undefined.

Later,
ZE.

//email me.//zealouselixir software.//msdn.//n00biez.//
miscellaneous links

[if you have a link proposal, email me.]

[twitter]warrenm[/twitter]

Yes, true, but still most mathematics and physics textbooks will use (1/c)*u instead of u/c. That is the convention. I personally wouldn''t override the / operator for a vector. I would want to get a compile error if I tried to divide a vector by a scalar since the main reason I would attempt that is mistyping a formula. Oh yeah, that was suppose to be the magnitude of u, not u.
Keys to success: Ability, ambition and opportunity.
Well, that''s where we differ. The main reason I would divide by a scalar would be to decrease the magnitude of the vector. (Of course, it should then have scalar!=0 checking, but anyway...)

Later,
ZE.

//email me.//zealouselixir software.//msdn.//n00biez.//
miscellaneous links

[if you have a link proposal, email me.]

[twitter]warrenm[/twitter]

This topic is closed to new replies.

Advertisement