Jump to content

  • Log In with Google      Sign In   
  • Create Account


What is your longest C++ macro


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
17 replies to this topic

#1 heron3d   Members   -  Reputation: 132

Like
0Likes
Like

Posted 12 December 2011 - 05:53 PM

I'm considering making a macro that is 27 lines long with a 8 character name. It is all trigonometry stuff that I use over and over again within the same iteration.
I'm thinking of doing this in order to avoid writing a function to make my program run faster. Do you guys think this is a good idea?

Sponsor:

#2 colinhect   Members   -  Reputation: 193

Like
1Likes
Like

Posted 12 December 2011 - 06:02 PM

I would implement it as a function first. If you find it is a performance bottleneck then put it in a macro (or inline function, or __fastcall, etc).

Unless you are calling the function a lot of times per frame I doubt the function calling overhead will affect much. In general, macros in C++ should be avoided if possible.

#3 Madhed   Crossbones+   -  Reputation: 2452

Like
1Likes
Like

Posted 12 December 2011 - 06:02 PM

Show us your macro.

I'm pretty sure you can turn it into an inline function instead. The compiler might even be smart enough to do that for you.

Edit: Dang, too slow.

#4 Chris_F   Members   -  Reputation: 1907

Like
0Likes
Like

Posted 12 December 2011 - 06:44 PM

I don't use macros in C++, because they can be a nightmare to debug and there is usually a better alternative.

#5 magicstix   Members   -  Reputation: 191

Like
0Likes
Like

Posted 12 December 2011 - 07:25 PM

I'm considering making a macro that is 27 lines long with a 8 character name. It is all trigonometry stuff that I use over and over again within the same iteration.
I'm thinking of doing this in order to avoid writing a function to make my program run faster. Do you guys think this is a good idea?


However fast you think your code is, someone has written it faster off the shelf...

#6 ApochPiQ   Moderators   -  Reputation: 14103

Like
2Likes
Like

Posted 12 December 2011 - 07:44 PM

This might backfire badly.

Calling a function can actually be faster than inlining the same code all over the place, because of the way instruction caches work on modern CPUs. You never know unless you profile; and, generally, unless your profiler is telling you that your function call itself is a serious bottleneck, it's best not to waste time on it.

#7 jjd   Crossbones+   -  Reputation: 2062

Like
0Likes
Like

Posted 12 December 2011 - 08:00 PM

This might backfire badly.

Calling a function can actually be faster than inlining the same code all over the place, because of the way instruction caches work on modern CPUs. You never know unless you profile; and, generally, unless your profiler is telling you that your function call itself is a serious bottleneck, it's best not to waste time on it.


I'd be really interested in finding out more about this topic. Is there a good source online that you would recommend?

-Josh

--www.physicaluncertainty.com
--linkedin
--irc.freenode.net#gdnet


#8 kilah   Members   -  Reputation: 382

Like
0Likes
Like

Posted 12 December 2011 - 08:22 PM

ApochPIQ is the best approach from my point of view. Aggresive inlined function may fail to fit the code within instruction cache, compiler is quite clever on this thought.. The best initial approach, if your macro is extensively used within another function, is to create a function near your caller, and use it in a straight manner. Iniline function hint is most likely good to use on small code functions (that generate small amount of ASM code), or functions that are not used extensively within another caller, not a 27 lines behemoth, from my point of view. Of course that would require both profiling and see asm code generated from the macro in each case.

#9 Hodgman   Moderators   -  Reputation: 26991

Like
1Likes
Like

Posted 12 December 2011 - 08:30 PM

I'd be really interested in finding out more about this topic. Is there a good source online that you would recommend?

Wikipedia ;)

Modern compilers actually treat inline as a hint, not a demand.

http://msdn.microsof...y/z8y1yy88.aspx
The insertion (called inline expansion or inlining) occurs only if the compiler's cost/benefit analysis show it to be profitable. Inline expansion alleviates the function-call overhead at the potential cost of larger code size.

So if you feel that some code is too small + used too often to be placed into a function, you can just put it into an inline function -- If the compiler agrees with you, it will inline that code (i.e. the same as if you'd used the OP's macro approach), otherwise, it will compile it as a regular function. In either case, the inline function is superior to the macro, as it's more maintainable, readable, debuggable, etc...

#10 jjd   Crossbones+   -  Reputation: 2062

Like
0Likes
Like

Posted 12 December 2011 - 08:42 PM

I'd be really interested in finding out more about this topic. Is there a good source online that you would recommend?

Wikipedia ;)

Modern compilers actually treat inline as a hint, not a demand.

http://msdn.microsof...y/z8y1yy88.aspx
The insertion (called inline expansion or inlining) occurs only if the compiler's cost/benefit analysis show it to be profitable. Inline expansion alleviates the function-call overhead at the potential cost of larger code size.

So if you feel that some code is too small + used too often to be placed into a function, you can just put it into an inline function -- If the compiler agrees with you, it will inline that code (i.e. the same as if you'd used the OP's macro approach), otherwise, it will compile it as a regular function. In either case, the inline function is superior to the macro, as it's more maintainable, readable, debuggable, etc...


Sorry, I wasn't clear in my reply. I know the generic effect of the 'inline' keyword, i.e. that it's a hint etc., I'm actually more interested in the details of how instruction cache design determines when it is beneficial or not.

-Josh

--www.physicaluncertainty.com
--linkedin
--irc.freenode.net#gdnet


#11 Promit   Moderators   -  Reputation: 5909

Like
0Likes
Like

Posted 12 December 2011 - 09:05 PM

It's like any cache. Macros increase code size, and each individual instance is considered distinct so it will take up a new slot in cache. A function can be stored once and referred to repeatedly. The trade off is a question of the memory used by repeating the code, versus the overhead of stack management in the call.

#12 AllEightUp   Moderators   -  Reputation: 4066

Like
1Likes
Like

Posted 12 December 2011 - 10:38 PM


I'd be really interested in finding out more about this topic. Is there a good source online that you would recommend?

Wikipedia ;)

Modern compilers actually treat inline as a hint, not a demand.

http://msdn.microsof...y/z8y1yy88.aspx
The insertion (called inline expansion or inlining) occurs only if the compiler's cost/benefit analysis show it to be profitable. Inline expansion alleviates the function-call overhead at the potential cost of larger code size.

So if you feel that some code is too small + used too often to be placed into a function, you can just put it into an inline function -- If the compiler agrees with you, it will inline that code (i.e. the same as if you'd used the OP's macro approach), otherwise, it will compile it as a regular function. In either case, the inline function is superior to the macro, as it's more maintainable, readable, debuggable, etc...


Sorry, I wasn't clear in my reply. I know the generic effect of the 'inline' keyword, i.e. that it's a hint etc., I'm actually more interested in the details of how instruction cache design determines when it is beneficial or not.

-Josh


This is hugely complicated anymore with multicore chips sharing various parts of the cache. But, at the most basic, the lowest level cache is relatively small and if a function gets too large (especially if it contains a loop), the CPU will have to request more code from a higher level cache which is a time consuming operation. This is not normally a problem as the CPU will be requesting the code before it is actually required due to the pipelined nature of most CPU's today. But, if you make a loop with too many instructions and external calls to fit in the code cache, you can give the CPU a tizzy fit quite easily. It will load up the start of the loop and start executing it, oops, call to another function, start that loading before the uops get there, hmm had to evict the start of the loop to do it, keep processing, damn, more calls evict some old functions and more of the starting loop, keep going....... Eventually when the loop comes around, nothing in the code cache is usable and you get a big ass stall while the start of the loop is reloaded and some of the first functions called are reloaded. The problem for the cache system is that if the loop point comes up and everything from the beginning of the loop is evicted, there will be too much data to load before the pipeline flushes, so you get a nasty stall while the cache is reloaded with the start of the loop. (Ignoring the bad case of branch missprediction where the cache started loading for "after" the loop instead of predicting a restart.)


Under normal circumstances none of this causes actual stalls. Over use of __forceinline and attribute equivalents on GCC/CLang can cause this because you are bypassing the compiler smarts and saying you are smarter. You are almost *NEVER* smarter than the compiler in anything other than very specific cases where you know a specific (short) function is massively used and should always be inlined. Even then, the compiler usually gets it correct so you are just stating the obvious. If you use forced inline functions, you should probably consider writing them in assembly instead. First, if they are properly short and concise then that should not be a problem and rewriting as a naked function will speed things up over what the compiler does. If you can't do that effectively, I highly suggest not using forced inline because you are inhibiting the compiler optimizations instead of benefiting them.

#13 wqking   Members   -  Reputation: 756

Like
0Likes
Like

Posted 13 December 2011 - 08:08 AM

Sorry, I wasn't clear in my reply. I know the generic effect of the 'inline' keyword, i.e. that it's a hint etc., I'm actually more interested in the details of how instruction cache design determines when it is beneficial or not.

-Josh


Yup, if you are a game developer and didn't know cpu cache, now it's time to learn it.
Google "cpu cache optimization" (without quote) will give you a lot of good topics.

For a quick start,
GDC 2003 — Memory Optimization
is a very very good start.

Though it's written in 8 years ago, it's still a must read before current CPU architecture dies. :-)
I decided to learn some about cpu cache optimization after read it.

http://www.cpgf.org/
cpgf library -- free C++ open source library for reflection, serialization, script binding, callbacks, and meta data for OpenGL Box2D, SFML and Irrlicht.
v1.5.5 was released. Now supports tween and timeline for ease animation.


#14 Slavik81   Members   -  Reputation: 360

Like
0Likes
Like

Posted 13 December 2011 - 09:49 PM

The largest macro I use is 9 lines long and is used begin the definition of a wrapper class that forwards function calls to it into a thread-safe message queue. Essentially, it gave me an easy way to make my threading invisible, as long as I kept my interfaces asynchronous.

However, I would agree with the people here that a set of trigonometric computations could probably best be extracted into a function. You can make an inline function every bit as performant as a macro (and likely better). Though, I'd be rather surprised if you saw a significant drop in performance even from something as expensive as a virtual call if you're already doing 27 lines worth of trig.

#15 NightCreature83   Crossbones+   -  Reputation: 2652

Like
0Likes
Like

Posted 14 December 2011 - 05:44 AM

The largest macro I use is 9 lines long and is used begin the definition of a wrapper class that forwards function calls to it into a thread-safe message queue. Essentially, it gave me an easy way to make my threading invisible, as long as I kept my interfaces asynchronous.

However, I would agree with the people here that a set of trigonometric computations could probably best be extracted into a function. You can make an inline function every bit as performant as a macro (and likely better). Though, I'd be rather surprised if you saw a significant drop in performance even from something as expensive as a virtual call if you're already doing 27 lines worth of trig.


A macro causes explicit inlined code to be created, function calls can be better especially for maths functions as you can guarentee that stuff stays withing SSE2 registers when using that optimisation. The load-store-hit on marshalling between normal floating point and SSE2 registers is far worse than the function call for example.
Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, Mad Max

#16 japro   Members   -  Reputation: 887

Like
0Likes
Like

Posted 14 December 2011 - 06:01 AM

I'd never use a macro in the C-sense to inline a function for performance. I don't see any advantage of that over just having an inline function (possibly a template). If you don't like the "hint" character of inline then there is usually some compiler specific way to force inlines. Also with high enough optimization settings, the compiler will also see things like stuff that can be kept in registers etc. I did some auto vectorization with templates lately where most sse intrinsics sat in their own inline function and the compiler literally optimized away EVERYTHING.

#17 samoth   Crossbones+   -  Reputation: 4465

Like
3Likes
Like

Posted 14 December 2011 - 06:54 AM

Cryptographers like to write macro abused code like that, see for example a snippet from AES:
#define ROUND(i,d,s) \
d##0 = TE0(s##0) ^ TE1(s##1) ^ TE2(s##2) ^ TE3(s##3) ^ rk[4 * i]; \
d##1 = TE0(s##1) ^ TE1(s##2) ^ TE2(s##3) ^ TE3(s##0) ^ rk[4 * i + 1]; \
d##2 = TE0(s##2) ^ TE1(s##3) ^ TE2(s##0) ^ TE3(s##1) ^ rk[4 * i + 2]; \
d##3 = TE0(s##3) ^ TE1(s##0) ^ TE2(s##1) ^ TE3(s##2) ^ rk[4 * i + 3]
Note that each TE... is a macro itself, and ROUND is called several times. Now try and debug that.

It's a matter of style, there are people who write code like this (taken from the "russian range coder") too:
while((low_ ^ low_ + range_) < TOP || range_ < BOT && ((range_ = (0 - low_) & BOT - 1), 1))
Without any doubt, the code is perfectly correct, and some people will even consider code like this "cool".

However, it's not obvious to the casual observer what's going on. Personally, I prefer that what code does is immediately obvious (to me and anyone else reading it). This includes, among other things, not having to read a line three times before understanding what it does, and error messages pointing to the correct location in the source and not being totally bogus. Macros tend to make code "ungraspable" and tend to generate bogus (or at least hard to pinpoint) errors.

Also, at several times in the past, I've found that writing clear and obvious code is not only much easier, but in fact generates faster code than a totally unreadable hand "optimized" version.

#18 heron3d   Members   -  Reputation: 132

Like
0Likes
Like

Posted 15 December 2011 - 02:25 AM

Thank you for all the responses guys. A function it is.

N




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS