Archived

This topic is now archived and is closed to further replies.

tom76

Good ways to optimise/optimize code?

Recommended Posts

I know about making functions inline if they''re to be accessed frequently, but are there some good tips for speeding up code? Mine currently has a lot of if statements which I suspect slows it down. "I envy you, who has seen it all" "And I, young sir, envy you, who have yet to see it for the first time..." - Daniel Glenfield 1st October 2001

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
quote:
Original post by tom76
I know about making functions inline if they''re to be accessed frequently, but are there some good tips for speeding up code?

Mine currently has a lot of if statements which I suspect slows it down.




Not really. If your if statements usually go the same way they''ll be branch predicted anyway.

General code optimizing tips: Always use an optimizing compiler. (MSVC Professional, not Standard or Student.)

Try to access data structures the way they''re layed out in memory. (Fill 2d arrays by row and not by column, etc)

Precalculate as much as possible outside of loops.

Run a profiler against your code to find out where your bottlenecks really are. (Studies show that programmers almost always optimize in the places that do the least good.)

Share this post


Link to post
Share on other sites
I don''t know assembly, just C and C++

"I envy you, who has seen it all"
"And I, young sir, envy you, who have yet to see it for the first time..."
- Daniel Glenfield
1st October 2001

Share this post


Link to post
Share on other sites
I think the most obvious optimization is to use the right data structure for the right job.

If you have a bad algorithm or data structure your program will perform badly. You could put all the effort in the world to optimize it but you could''t not compete against a better algorithm (and it doesn''t necessarily mean more complicated).

So the morale is, keep it simple stupid (aka KISS) and straight to the point.

Patrick Lafleur

System Engineer
Creation Objet Inc.

Share this post


Link to post
Share on other sites
Has anyone an opinion on AoS vs SoA? (array of structures and reverse)The AoS seems like the C++ way to do things, but the SoA can be considerably faster if the program needs to go through the whole data and perform operations only on some (or just one) of the fields in the structure because of the cache speed-up.

Edited by - Diodor on November 6, 2001 11:54:08 PM

Share this post


Link to post
Share on other sites
Lots of tricks to do this, but one thing you will want to remember is the 80 - 20 rule. That comes up alot in life btw. Anyway, 80% of the time the user will run 20% of the code. So optimize that.

You can unroll loops, that takes some serious typing to get any significant increase in preformance though. You could also tell the compiler to keep certin variables in the CPU register. This helps some, but if you use an optimize compiler it might do this for you. I''m not sure.

Best thing to do has already been told though, profile.

Glandalf
Just a thought

Share this post


Link to post
Share on other sites
quote:
Original post by TheFez
Another thing that can help is assembly. It makes your code longer but it will speed it up (as long as you write your code well).


Unless you really know what you''re doing, assembly isn''t going to speed things up, but it might actually slow your program down. An optimizing compiler today does a much better job than a poor assemblyly programmer.

Share this post


Link to post
Share on other sites
quote:
Original post by Anonymous Poster
General code optimizing tips: Always use an optimizing compiler. (MSVC Professional, not Standard or Student.)



Just a note, the student version of VC++ is exactly the same as professional, it''s just a lot cheaper. It''s optimizing, it has a profiler, and everything else that''s in professional. Of course, you need to be a full-time student or work for a teaching institution to be able to get it.


codeka.com - Just click it.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
I think they probably meant the learning/intro edition, which is not the same as the professional edition.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Student versions are cheaper because the license agreement probhits distribution of software made with it.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Check out this link: Programming Optimization. It''s filled with fantastic insights - especially the end section.

Another good link is: Optimizing in Higher Programming Languages.

Don''t buy books unless you really need to, because there''s loads of stuff on the ''Net. Just think - that collected mass of nerd-dom has got to produce some useful info

Alistair Keys, .Net advocate (.Not)

P.S. Spot the odd one out: "More specifically, there are five areas where Microsoft is building the .NET platform today, namely: Tools, Servers, XML Web services, Clients, and .NET experiences."

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Oops, forgot this... inlining functions won''t save you very much at all (if anything), according to Optimize to the Max - Winning the Passing Game. Use macros instead, it suggests.

A quote: "Looking at the design of modern CPUs it is easy to see that inlining does not provide any performance improvement."

Alistair Keys

.Net defined

Share this post


Link to post
Share on other sites
quote:
Original post by Anonymous Poster
Oops, forgot this... inlining functions won''t save you very much at all (if anything), according to Optimize to the Max - Winning the Passing Game. Use macros instead, it suggests.

A quote: "Looking at the design of modern CPUs it is easy to see that inlining does not provide any performance improvement."

Alistair Keys

.Net defined


The article you quote doesn''t even say anything about macros. How on earth did you reach the conclusion that macros should be used instead of inline functions?

Share this post


Link to post
Share on other sites
quote:
Original post by Anonymous Poster
A quote: "Looking at the design of modern CPUs it is easy to see that inlining does not provide any performance improvement."



Seeing as how compilers generate the exact same code for inlined functions that they would have generated for macros (given the function is inlinable), this is simply laughable. It''s a typical ploy, though: the quoted author, knowing he doesn''t know what he''s talking about, throws in gobbledegook about CPU design, figuring his audience will be ignorant of CPU design and take his word on faith.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
NuffSaid: You''re entirely right! I was thinking of one of the author''s other columns: Video Codec on Fast Forward. That''s the one where he says he replaced inlines with macros and got a major speed boost. My bad!

Stoffel: I must say, I''m more convinced by a columnist for a respected journal than someone saying "it''s all rubbish". Explain why you think it''s rubbish, and argue with his results. If you don''t agree because of some fact, fine. Otherwise, hold your peace.

Alistair Keys

Share this post


Link to post
Share on other sites
quote:
Original post by Anonymous Poster
Stoffel: I must say, I'm more convinced by a columnist for a respected journal than someone saying "it's all rubbish". Explain why you think it's rubbish, and argue with his results. If you don't agree because of some fact, fine. Otherwise, hold your peace.


I did. When the compiler inlines a function, the assembly code generated by inline functions is exactly the same as the assembly code generated by macros. The same instructions go to the CPU, so the CPU has no way to tell the difference between what is inlined and what is not. Arguing that the CPU is more efficient at running macros than inlined code is nonsense--there is no difference from the CPU's point of view.

Is that enough of an explanation, or do I need to re-re-quote my last post?

Want proof? See "Effective STL", Meyers, Item 46: Consider function objects instead of functions as algorithm parameters. He clocked several platforms on calls to sort using an out-of-line function and a function object which can be inlined. At worst, the object version was 50% faster; at best 160%.

If the column was from a respected journalist and assuming he knows what he's talking about, then I highly suspect he was either misquoted or was trying to say something other than what he ended up saying. Just because a journal is respected doesn't mean that everything in it is true. Classic example: IEEE Signal Processing journal had an article called "Breaking the Nyquist Limit" (hint--it can't be broken, the author made many critical mistakes, and there was fallout for many issues to come).

PS: I believe I provided enough empiric proof (instructions look the same to the CPU regardless of inlining) in my first post, but nevertheless, I do not appreciate being asked to keep my peace.

Edited by - Stoffel on November 12, 2001 2:38:06 PM

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
I''m confused. The author clearly says, "Looking at the design of modern CPUs it is easy to see that inlining does not provide any performance improvement. In most cases, call and ret instructions are processed in zero cycles due to successful static branch prediction and instruction prefetching. However, excessive inlining may blow up the size of the code and ultimately reduce performance due to the increased likelihood of instruction cache misses. Perhaps the only reason to use inlining is when the routines are extremely compact (just a few operations) and called very frequently". What did he mean?

Incidentally, I''m not arguing a non-inlined function versus an inlined function. I''m arguing (or rather, that author is) a macro (i.e. NO function) versus inlined functions. He reckons it''s quicker. Why would he lie? See the second link about the video codec.

I wrote a quick test program. In debug there was a big difference, but in the release version there was sod all. Maybe he is talking nonsense, but he says it with real belief. It doesn''t really matter, because the smart option would be to try both for yourself and see.

And remember: they can''t break you if you don''t have a backbone.

Alistair Keys

P.S. I thought Dr. Dobb''s Journal was respected. Is it?

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Because I love the sound of my own voice (clearly), I thought I''d clarify something. You might be making the assumption I''m talking about inlining no-parameter functions. It''s easy to imagine, of course, that these can be treated as type-safe macros (and they probably are, as you say). However, the author is talking about inlining functions *with parameters*. This, of course, is not so clear cut. Read this snippet, if you haven''t already:

"As I''ve mentioned in my "Winning the Passing Game" column, even if a method is inlined, the parameters are still pushed onto the stack or stack-simulating temporary variables. If the method is called or inlined, excessive parameter passing can degrade performance dramatically. I witnessed a very good example of this unfortunate truth. My first action was to add the /Oy compiler option for frame pointer omission. Later, I tried an even better solution — I replaced all of the "convenience" methods with macros, thus totally eliminating calling/inlining and the parameter-passing hassle".

He then goes on to give sample code, which you should investigate. I''m too lazy to do this, but if you do be sure to report the results.

Here''s some info about the columnist. Decide for yourself whether he sounds credible:

Max is a software developer specializing in code optimization, signal processing and 3D graphics. He''s a Ph.D. candidate and his research interests include algorithm and code optimization, compiler design and ultrasound imaging. Max can be reached at maxf@webzone.net.

When I hear your argument, it sounds like religious dogma, of the "I''ve been taught this and I believe it more than you!" school - just like my argument, but from a different perspective. Are you willing to change your opinions when presented with new evidence?

Alistair Keys

P.S. I''ll give you a heart attack yet! Wait until I get onto "goto is okay for optimised code" (joking ).

Share this post


Link to post
Share on other sites
quote:
Original post by Anonymous Poster
even if a method is inlined, the parameters are still pushed onto the stack or stack-simulating temporary variables. If the method is called or inlined, excessive parameter passing can degrade performance dramatically.

Really?
      

#include <iostream.h>

#include <time.h>

#include <stdlib.h>


#define maxDef(a, b) (a > b ? a : b)


inline int maxFunc(int a, int b)
{
return (a > b ? a : b);
}

int main()
{
srand((unsigned int)time(0));

int x = rand() % 100;
int y = rand() % 100;

int max = maxDef(x, y);
int max2 = maxFunc(x, y);

// Use the variables, or the above code will be optimised out

cout << max << endl;
cout << max2 << endl;

return 0;
}

Now let's have a look at the generated code:

; 18 : int x = rand() % 100;

call _rand
cdq
mov ecx, 100 ; 00000064H
idiv ecx
mov esi, edx

; 19 : int y = rand() % 100;

call _rand
cdq
mov ecx, 100 ; 00000064H
idiv ecx

; 20 :
; 21 : int max = maxDef(x, y);

mov eax, esi
cmp esi, edx
jg SHORT $L1712
mov eax, edx
$L1712:

; 22 : int max2 = maxFunc(x, y);

cmp esi, edx
jg SHORT $L1719
mov esi, edx
$L1719:

I don't see a lot of pushing and popping. Now maybe this simple example isn't representative of the average case, but it certainly looks like inline functions are, well, inlined.

~~~~~~~~~~
"Usewhitespacetoenhancereadability" - Steve McConnell, Code Complete

Edited by - Martee on November 12, 2001 4:10:56 PM

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
You''re definitely winning this fight. I knew that it would be tough, but I like the pain . Anyway, maybe this will sway the argument in my favour...

Your sample code doesn''t look like the version he used. For the record: ** this code is a conversion of the author''s, and is faster without classes **. Sorry, but I haven''t disassembled it.


#include
#include

const unsigned int MAX_TEST_COUNT = 160000000; // any big number

#define WITH_CLASSES

#ifdef WITH_CLASSES
class CFoo {
public:
CFoo() { x = 0; }
int x;
int sum(int a, int b) { return x + a + b; }
};
#else
struct CFoo {
int x;
};
#define foo_sum(foo_class, a, b) (foo_class.x + (a) + (b))

#endif // WITH_CLASSES


int main()
{
CFoo foo;
DWORD startTime = GetTickCount();
int myVar = 0, a = 2, b = 3;

#ifdef WITH_CLASSES
for (int i = 0; i < MAX_TEST_COUNT; i++)
myVar += foo.sum(++a, --b);
#else
for (int i = 0; i < MAX_TEST_COUNT; i++)
myVar = foo_sum(foo, ++a, --b);

#endif // WITH_CLASSES

DWORD endTime = GetTickCount();

std::cout << "Time taken = " << endTime - startTime << std::endl;
std::cout << "For the record, myVar = " << myVar << std::endl;
return 0;
}


The tabbing will probably come out all wrong, sorry! Incidentally, I added the ++ and -- in to make it take a decent amount of time. Without it, it does it in 0 msecs (!).

It looks like the author could be on to something... but probably not. After all, the conversion is ugly and slow. Also, the difference is hardly that great.

I just like arguments. It will be interesting to see how this one turns out.

Alistair Keys

P.S. Some people would pay for punishment like this, but I get it for free on GameDev .

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Doh! Another day, another fight. After playing around with the test program (and making it less pants) it''s pretty apparent that there is ** no difference ** between the two methods at all. Sigh. Things would have been so much easier for me if I''d agreed with you (Stoffel) so many posts ago, instead of blindly fighting a lost cause.

Just to reiterate: ** I lost, and I will put a gypsy curse on that lying columnist, damn his eyes! **.

What a letdown! Oh well, I''ll let you lot investigate this further. Maybe you lot can find some merit in it. Well, just remember that shaking up the system occasionally can be good (and fun ).

Oh well, we will meet again some day, my arch nemesis.

Alistair Keys

Share this post


Link to post
Share on other sites
quote:
Original post by Anonymous Poster
P.S. I''ll give you a heart attack yet! Wait until I get onto "goto is okay for optimised code" (joking ).


In the absence of exceptions, that''s perfectly acceptable. No part of the C/C++ language is without purpose or utility (except perhaps keyword auto).

quote:

You might be making the assumption I''m talking about inlining no-parameter functions.


I wasn''t.

quote:

"As I''ve mentioned in my "Winning the Passing Game" column, even if a method is inlined, the parameters are still pushed onto the stack or stack-simulating temporary variables. If the method is called or inlined, excessive parameter passing can degrade performance dramatically. I witnessed a very good example of this unfortunate truth. My first action was to add the /Oy compiler option for frame pointer omission. Later, I tried an even better solution — I replaced all of the "convenience" methods with macros, thus totally eliminating calling/inlining and the parameter-passing hassle".


The author was clearly not working with a modern compiler. Here''s my first shot at code:
  
#include <stdlib.h>

inline int multiplyAccumulate (int mac_reg, int tap, int input)
{
return mac_reg + tap * input;
}

int main ()
{
return (int) multiplyAccumulate (3, 5, 2);
}
//dissassembly

PUBLIC _main
; COMDAT _main
_TEXT SEGMENT
_main PROC NEAR ; COMDAT

; 10 : return (int) multiplyAccumulate (3, 5, 2);

push 13 ; 0000000dH
pop eax

; 11 : }

ret 0
_main ENDP
_TEXT ENDS
END

The compiler evaluated the multiplyAccumulate command at compile time--the program simply returns 13, since that is 3 + 5 * 2.
Second try:
  
#include <stdlib.h>

inline int multiplyAccumulate (int mac_reg, int tap, int input)
{
return mac_reg + tap * input;
}

int main ()
{
return (int) multiplyAccumulate (rand (), rand (), rand ());
}
//disassembly

PUBLIC _main
EXTRN _rand:NEAR
; COMDAT _main
_TEXT SEGMENT
_main PROC NEAR ; COMDAT

; 9 : {

push esi
push edi

; 10 : return (int) multiplyAccumulate (rand (), rand (), rand ());

call _rand
mov edi, eax
call _rand
mov esi, eax
call _rand
imul esi, edi
add eax, esi
pop edi
pop esi

; 11 : }

ret 0
_main ENDP
_TEXT ENDS
END

The math is completely done in registers. The only pushes and pops used are those to straighten the stack (boilerplate for any function call, including main). There are no "stack-simulating temporary variables", unless one includes registers as overhead, in which case one has to put down the crack pipe.

quote:

He then goes on to give sample code, which you should investigate. I''m too lazy to do this, but if you do be sure to report the results.


Yeah, that seems fair. You raise the stink and expect me to disprove it. Well, I already have above to my satisfaction; if you need more proof, do it yourself.

quote:

When I hear your argument, it sounds like religious dogma, of the "I''ve been taught this and I believe it more than you!" school - just like my argument, but from a different perspective.


It may sound that way to you, but my opinions are backed by real-world experience and experimentation. And for future reference, I was taught precious little about programming: my degree and current study is in the area of signal processing, though my profession is software tool development.

quote:

Are you willing to change your opinions when presented with new evidence?


Well, you''ve already been disproven. I hate to get pedantic, but since I''m in a proof-related class currently, that''s where my mindset is:
- Your postulate is that there is no gain from inlining, reason being that inlined functions still use temporary variables that simulate the push/fetch/pop method used in regular function calls.
- Since your postulate is absolute ("there is no gain" instead of "there is sometimes no gain"), a single instance of counter-proof is all that is needed to disprove your postulate.
- Counter-proof provided, postulate disproved.

I wouldn''t normally be so nasty about this, but instead of arguing my statements with code and examples, you''ve tried to paint me as a hard-headded zealot. Optimization is my bread and butter, and I have no reason to ignore an innovation or quirk that allows my code to be run faster. In fact, that''s the very reason I visit these boards, and I do find about one thing a month I never knew before. However, it''s up to me to experiment on my own and decide whether each new bit of knowledge actually works. My job depends on my NOT taking things on faith. This whole "inlining sux0rs, macros rulez0rs" thread comes up about once every two months here. It''s rubbish. There are dependency consequences for inlining, and are therefore to be used sparingly and only when necessary, but that''s not what anybody here was talking about, was it?

Share this post


Link to post
Share on other sites