• Advertisement
Sign in to follow this  

faster using macros

This topic is 3628 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

hi, I was just wondering if it would be faster to use macros for basic calculations like vector addition, dot, cross products etc. than to use standard functions, I've seen this done in the quake source code but what are the benefits?

Share this post


Link to post
Share on other sites
Advertisement
It would depend on what language you're using.

Share this post


Link to post
Share on other sites
Using a macro for this has the same benefits and caveats as using one for everything. The code is copied every time it's used so there is no function call overhead, but there is a size overhead for the same reason.

If you're using this very often (inside a loop for example) then I'd probably use an inline function or macro because the call overhead in that case may be significant, but if the loop isn't unrolled (which it may not be) then the size overhead will be insignificant.

Share this post


Link to post
Share on other sites
A macro is just a piece of text with placeholders in it, which, when the macros are expanded, are replaced with the text you pass to the macro. An inline function is a piece of code that is inserted at the point of invocation of the function, thus no function call overhead.

You don't use macros for this case anymore because the inline keyword is supported in modern compilers. It's just safer. (E.g. when you define a macro #define square(x) ((x)*(x)) and you invoke it with int bla=square(i++) then the expanded code would be int bla=((i++)*(i++)), which is obviously not what you want. An inline function does what is expected and is just as fast.)

Also note that function calls on CPUs like Core etc. are so cheap that function inlining may actually hurt performance because the larger code might not fit anymore into the instruction cache.

Share this post


Link to post
Share on other sites
Quote:
Original post by staticVoid2
so whats the difference between macro and inline?


(Assuming C++)
Using macros you force the inlining, lose features like namespaces and type checking, and get the possibility of hard to spot bugs because arguments are literally copied. In general use the inline keyword, then the compiler can decide whether it will improve performance at all to inline it.

Share this post


Link to post
Share on other sites
Quote:
Original post by CTar
Quote:
Original post by staticVoid2
so whats the difference between macro and inline?


(Assuming C++)
Using macros you force the inlining, lose features like namespaces and type checking, and get the possibility of hard to spot bugs because arguments are literally copied. In general use the inline keyword then the compiler can decide whether it will improve performance at all to inline it.


I believe that in recent MVC and gcc versions, inline has no practical impact. Compilers will inline anything they perceive will improve performance (if requested). Even worse, MVC 2008 has some such aggressive inline analysis, that it's capable of deducing certain simple algorithms, and performing them during compile-time. For example, summing up a set of compile-time constants will result in calculation performed by compiler. For this reason it's preferable to use functions, since macros may obscure the algorithm, and confuse the compiler, resulting in less optimized code.

On modern compilers for modern applications, macros offer no performance benefit whatsoever, at expense of huge loss of code clarity.

There are certain situations where macros may improve performance, but it's unrelated to code, and more to certain code organizational aspects. But those apply only to some really specific and detailed aspects. This is related mostly to certain value access across compilation units, and this type of problem can be solved with templates, allowing additional optimization to be performed.

Use macros if you *need* copy-pasted code, and literally copy-pasted code. For example, function calling convention and other parameters, long names and namespaces for automatically managed classes and functions, or perhaps various forms of meta information (RPC, serialization, ORM), but only if that code is handled in (almost)completely automated fashion. (almost)Never use it for actual logic.

Share this post


Link to post
Share on other sites

#define sign(a)(a < 0 ? -1 : 1)

inline int sign(a)
{
if(a < 0) return -1;
else return 1;
}




so, in this case, which one would be better(more efficient, faster)?

Share this post


Link to post
Share on other sites
The function is better in all ways.

EDIT: And although it is entirely irrelevant to the generated code with almost 100% certainty, you've made the function "look" more complex, perhaps from an unconscious bias toward the macro. However, since macros are simply text substitutions, you may write the sign function as:

inline int sign( int a )
{
return a < 0 ? -1 : 1;
}


Although for more complex operations I'd prefer your original if/else for readability, most likely.

Share this post


Link to post
Share on other sites
The compiler knows which is faster. By using inline you give the compiler more information in making that decision.

Share this post


Link to post
Share on other sites
A bad test case. Even if both were macros the code inside is different, so it's hard to tell. It could depend on what compiler you're using, and even what hardware you're using.

I don't think this subject will ever be as black and white as that. However, I would use inline functions simply because they're typesafe and less prone to errors such as those previously mentioned.

Share this post


Link to post
Share on other sites
When quake was written, compilers might not have been as good. The macro is applied before the compiler ever sees the file so it can compensate for a compiler that doesn't support inlining.

Share this post


Link to post
Share on other sites
Quote:

A bad test case. Even if both were macros the code inside is different


would they not produce the same bitcode though, I thought (condition)?true:false
was just an alernative way to write an if else statement, when you are returning one value anyway.

Share this post


Link to post
Share on other sites
Quote:
Original post by Vorpy
When quake was written, compilers might not have been as good. The macro is applied before the compiler ever sees the file so it can compensate for a compiler that doesn't support inlining.


More to the point, Quake was written in a programming language without the inline keyword. C89 does not support inline functions.

Share this post


Link to post
Share on other sites
These micro-optimizations won't get you anywhere. Getting rid of the conditionals should be your major concern.

This function yiels -1/+1 for negative/zero or positive:

int sign(int i)
{
return ((i >> 30) & ~1) + 1;
}


And this function yields -1/0/+1 for negative/zero/positive:

int sign(int i)
{
unsigned u = -i;
return (i >> 31) | (u >> 31);
}

Share this post


Link to post
Share on other sites
You just think about how numbers are represented in two's complement. The sign is the bit at position 31, so if you shift a signed number 31 to the right, you get either 00000000000000000000000000000000 (0) for positive numbers or 11111111111111111111111111111111 (-1) for negative numbers.

But since you don't want 0 and -1 but instead 1 and -1, you just multiply by 2 and add 1 (2*0+1 = 1 and 2*-1+1 = -1). That would be (i >> 31) * 2 + 1 or ((i >> 31) << 1) + 1.

I personally don't like to shift right by 31 and then shift left by one, so I just shift right by 30 and mask the least significant bit out with &~1 which is essentially the same.

You basically learn this stuff when you program in assembly language where you can do even cooler stuff because you can access the status flags.

Here's another gem for computing the absolute value of a signed integer:


int abs(int i)
{
int m = i >> 31; // m is either 0 (i positive) or -1 (i negative)
return (i ^ m) - m; // invert bits and subtract -1 (add 1) if i was negative
}

To invert the sign of a number in two's complement, you invert all the bits and add 1. Since we only want to do this here if i is negative, we compute the sign and use that as a mask.
If i was negative, i & -1 -(-1) yields -i.
If i was positive, i & 0 - 0 yields i.

Share this post


Link to post
Share on other sites
AFAIK the result of right shifting of signed integer types especially if their value is negative is implementation dependent according to the C standard.

Share this post


Link to post
Share on other sites
Another important theoretical topic.

Any logical relation (and likely any computation) can be realized using one single operator.

The implication of this is, that if the only gate you can manufacture is, for example, NAND, you can realize any computer circuit you want using only that. From JK cell right up to quad-core AMD-equivalent.

Same applies for all binary conditions. Any boolean relation can be converted into multiple forms, including such that use only a subset of logical operators.

Share this post


Link to post
Share on other sites
Quote:
Original post by DevFred
These micro-optimizations won't get you anywhere. Getting rid of the conditionals should be your major concern.

This function yiels -1/+1 for negative/zero or positive:

int sign(int i)
{
return ((i >> 30) & ~1) + 1;
}


And this function yields -1/0/+1 for negative/zero/positive:

int sign(int i)
{
unsigned u = -i;
return (i >> 31) | (u >> 31);
}


The funny thing is, with the PentiumPro upwards, the cmov etc. non-branching conditional statements should have made that thing of micro optimization obsolete.

Share this post


Link to post
Share on other sites
Quote:
Original post by Konfusius
The funny thing is, with the PentiumPro upwards, the cmov etc. non-branching conditional statements should have made that thing of micro optimization obsolete.
It certainly does, too. It is very hard (if possible) to outperform a decent compiler with such tricks on reasonably modern hardware.
Maybe they can buy you a few microseconds if you program on a 16 bit microcontroller, but on typical desktop processors, they usually run just as fast or even slower than the special instructions the compiler would use normally.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement