Sign in to follow this  
eGamer

Code Optimization in visual studio

Recommended Posts

given the following code:
void Quat_To_Matrix1(Quaternion& quat,Matrix& matrix)
{
  const float x = quat[0];
  const float y = quat[1];
  const float z = quat[2];
  const float w = quat[3];

  matrix[0][0] = w*w + x*x - y*y - z*z;
  matrix[0][1] = 2*x*y + 2*w*z;
  matrix[0][2] = 2*x*z - 2*w*y;
  matrix[0][3] = 0.0f;

  matrix[1][0] = 2*x*y-2*w*z;
  matrix[1][1] = w*w - x*x + y*y - z*z;
  matrix[1][2] = 2*y*z + 2*w*x;
  matrix[1][3] = 0.0f;

  matrix[2][0] = 2*x*z + 2*w*y;
  matrix[2][1] = 2*y*z - 2*w*x;
  matrix[2][2] = w*w - x*x - y*y + z*z;
  matrix[2][3] = 0.0f;

  matrix[3][0] = 0.0f;
  matrix[3][1] = 0.0f;
  matrix[3][2] = 0.0f;
  matrix[3][3] = w*w + x*x + y*y + z*z;
}


void Quat_To_Matrix2(Quaternion& quat,Matrix& matrix)
{
  const float x = quat[0];
  const float y = quat[1];
  const float z = quat[2];
  const float w = quat[3];

  float _w = w*w;
  float _x = x*x;
  float _y = y*y;
  float _z = z*z;
  
  matrix[0][0] = _w + _x - _y - _z;
  matrix[0][1] = 2*x*y + 2*w*z;
  matrix[0][2] = 2*x*z - 2*w*y;
  matrix[0][3] = 0.0f;

  matrix[1][0] = 2*x*y-2*w*z;
  matrix[1][1] = _w - _x + _y - _z;
  matrix[1][2] = 2*y*z + 2*w*x;
  matrix[1][3] = 0.0f;

  matrix[2][0] = 2*x*z + 2*w*y;
  matrix[2][1] = 2*y*z - 2*w*x;
  matrix[2][2] = _w - _x - _y + _z;
  matrix[2][3] = 0.0f;

  matrix[3][0] = 0.0f;
  matrix[3][1] = 0.0f;
  matrix[3][2] = 0.0f;
  matrix[3][3] = _w + _x + _y + _z;
}
Suppose i am using Visual Studio 2008, is there any need to the optimization in the second function ? i mean do visual studio reaches an assembly in function one that is optimized without making the manual optimization in the second function ? [Edited by - ApochPiQ on June 3, 2009 8:56:02 AM]

Share this post


Link to post
Share on other sites
Generally it's better to ask the compiler if it does an optimization rather than a human being. You can use the /FA family of switches to get MSVC to output the assembly it generates for a given C++ file.

Share this post


Link to post
Share on other sites
Quote:
Original post by SiCrane
Generally it's better to ask the compiler if it does an optimization rather than a human being. You can use the /FA family of switches to get MSVC to output the assembly it generates for a given C++ file.



i am asking this question to know the level of optimization we have to do manually
and what are the issues modern c++ compilers take care of.


the Code Generator of Maple generates function two, while i saw function one
in very large free source projects that are realy reliable.

please show some respect in your answers, otherwise it is useless in this forum.

Share this post


Link to post
Share on other sites
Quote:
Original post by eGamer
please show some respect in your answers, otherwise it is useless in this forum.


I was really about giving you some links about modern optimization. But then I read that phrase where you request respect manually.

...

On the other hand, let me give you a very helpful link anyways: click. And as it turns out, there appear all the links I wanted to recommend.


edit: Out of curiosity, where exactly was someone disrespectful? I fail to find that.

Share this post


Link to post
Share on other sites
Quote:
Original post by eGamer
please show some respect in your answers, otherwise it is useless in this forum.


I have no idea how it is that you sense any disrespect here.

Share this post


Link to post
Share on other sites
Quote:
Original post by eGamer

i mean do visual studio reaches an assembly in function one that is optimized without making the manual optimization in the second function ?


Both functions are completely unoptimized.

You first need to rewrite the code as SIMD (SSE - SSE4, depending on target platform) in assembly, then you need to run a decent profiler, to determine if instruction scheduling works as expected, or whether there are any other issues. Of course, profile against non-SIMD version, it might be faster.

In addition, the data needs to be properly aligned and prefetched in cache, and the loop calling it must be either cache oblivious or take into consideration cache line sizes.

Lastly, you need to profile the whole application to determine if bottlenecks occur elsewhere, whether inlining the function is possible or beneficial, and whether the rest of application doesn't introduce unexpected bottlenecks.

Share this post


Link to post
Share on other sites
Your manual optimizations are (usually) unnecessary, a decent compiler will perform them, or at least consider them. It's called common sub-expression elimination; expressions like "x*x" and "2*x*y" can be calculated once and used in other expressions.

The opposite is rematerialization; sometimes you don't want to do too much CSE if you don't have enough registers to store all the CSEs and don't want to spill to memory, so you recalculate expressions because it's cheaper than keeping them around.

Share this post


Link to post
Share on other sites
Quote:
Original post by outRider
Your manual optimizations are (usually) unnecessary, a decent compiler will perform them, or at least consider them. It's called common sub-expression elimination; expressions like "x*x" and "2*x*y" can be calculated once and used in other expressions.
Be careful with such assumptions.
C compilers often refuse to do any kind of algebraic transformation on floating point expressions, so I don't expect it to work for things that aren't exactly identical. That is it may well fail to expose certain possible common sub-expressions within the sums (e.g. things like x - y and 2 + x - y) .

The usual aliasing issues apply here too of course. Caching the quaternion members (as the OP is doing here) is a good idea for reasons other than saving on typing.

Share this post


Link to post
Share on other sites
Quote:
Original post by implicit
Quote:
Original post by outRider
Your manual optimizations are (usually) unnecessary, a decent compiler will perform them, or at least consider them. It's called common sub-expression elimination; expressions like "x*x" and "2*x*y" can be calculated once and used in other expressions.
Be careful with such assumptions.
C compilers often refuse to do any kind of algebraic transformation on floating point expressions, so I don't expect it to work for things that aren't exactly identical. That is it may well fail to expose certain possible common sub-expressions within the sums (e.g. things like x - y and 2 + x - y) .


Hence the "usually" and "at least consider them" qualifiers. I'm aware of the FP complications. Most compilers I've used have a switch for fast vs precise FP consistency and some use fast at high opt.

Quote:
Original post by implicit
The usual aliasing issues apply here too of course. Caching the quaternion members (as the OP is doing here) is a good idea for reasons other than saving on typing.


Aliasing is irrelevant to the OPs question, the difference between the two snippets is the explicit calculation of the squared sub-expressions, all the variables are pinned. But yes, in general aliasing is a factor in correctly identifying CSEs and invariants.

Share this post


Link to post
Share on other sites
Quote:
Original post by outRider
Quote:
Original post by implicit
Quote:
Original post by outRider
Your manual optimizations are (usually) unnecessary, a decent compiler will perform them, or at least consider them. It's called common sub-expression elimination; expressions like "x*x" and "2*x*y" can be calculated once and used in other expressions.
Be careful with such assumptions.
C compilers often refuse to do any kind of algebraic transformation on floating point expressions, so I don't expect it to work for things that aren't exactly identical. That is it may well fail to expose certain possible common sub-expressions within the sums (e.g. things like x - y and 2 + x - y) .


Hence the "usually" and "at least consider them" qualifiers. I'm aware of the FP complications. Most compilers I've used have a switch for fast vs precise FP consistency and some use fast at high opt.

Quote:
Original post by implicit
The usual aliasing issues apply here too of course. Caching the quaternion members (as the OP is doing here) is a good idea for reasons other than saving on typing.


Aliasing is irrelevant to the OPs question, the difference between the two snippets is the explicit calculation of the squared sub-expressions, all the variables are pinned. But yes, in general aliasing is a factor in correctly identifying CSEs and invariants.


Also note that operand reordering is skipped in an IEEE conforming compiler, which also means that tons of constant-folding may be skipped, which means operand reordering might be skipped, which means ..... .

GCC has -fassociative-math to enable reordering where the result may vary due to that reordering. Dunno about VC.


Quote:
It's called common sub-expression elimination

Oh, that is just one of many, many possible optimization passes. Also, CSE is quite difficult (read: mostly impossible) for function calls whose callee's definition is invisible to the compiler (which for most functions in C and C++ is the case), to be sure, make the functions inline, or as a special option, use whole program optimization (to help the compiler with that, you could provide an amalgamized version of your program code).

Note that in GCC there are the function attributes "const" and "pure", with which you can promise the compiler that two consequtive calls to a function with the same arguments will yield the exact same return value, and that calling them will have no side effects. Also note that that attribute const is not related to C++ const keyword.

For a glimpse about existing optimization passes, have a look at this very incomplete overview: clicky.

Share this post


Link to post
Share on other sites
Quote:
Original post by Zahlman
Quote:
Original post by eGamer
please show some respect in your answers, otherwise it is useless in this forum.


I have no idea how it is that you sense any disrespect here.



do you think: "Ask The Compiler?" is a kind of respect?
so why do i ask C++ Gurus here in this forum ?
i've read articles about code optimization, but i wanted to know what optimization
tips are still applicable in visual studio, especially that you can
change the project settings so that you check the Optimize Code in the IDE.


by the way my mother language is French and may be i misunderstood his answer.

Share this post


Link to post
Share on other sites
Quote:
Original post by eGamer
Quote:
Original post by Zahlman
Quote:
Original post by eGamer
please show some respect in your answers, otherwise it is useless in this forum.


I have no idea how it is that you sense any disrespect here.

do you think: "Ask The Compiler?" is a kind of respect?


But it is generally better. To see which code will be optimized better, well, check the code the compiler emits. There is no general answer to the question which code is better.

Btw, apart from asking the compiler, don't forget to ask the profiler.


Quote:
i've read articles about code optimization, but i wanted to know what optimization tips are still applicable in visual studio, especially that you can
change the project settings so that you check the Optimize Code in the IDE.

But what you just said is "asking the compiler", yet you think such an advice is disrespectful.

To conform to the rule: There is no general rule for optimization. The best is you get modern information about x86 code optimization. For that, links have already fallen in this thread. We could, btw, also just recommended you to rtfm.

Share this post


Link to post
Share on other sites
Quote:
Original post by eGamer
do you think: "Ask The Compiler?" is a kind of respect?
so why do i ask C++ Gurus here in this forum ?
i've read articles about code optimization, but i wanted to know what optimization
tips are still applicable in visual studio, especially that you can
change the project settings so that you check the Optimize Code in the IDE.

And that's exactly why SiCrane told you to ask the compiler. The compiler is the one who knows best whether or not it applied optimizations. People here can only speculate. The compiler knows it. So ask him by looking at the generated assembly.

In the future, I would suggest you better read the provided answers before calling someone disrespectful.

Share this post


Link to post
Share on other sites
Quote:
Original post by phresnel
Quote:
Original post by eGamer
Quote:
Original post by Zahlman
Quote:
Original post by eGamer
please show some respect in your answers, otherwise it is useless in this forum.


I have no idea how it is that you sense any disrespect here.

do you think: "Ask The Compiler?" is a kind of respect?


But it is generally better. To see which code will be optimized better, well, check the code the compiler emits. There is no general answer to the question which code is better.

Btw, apart from asking the compiler, don't forget to ask the profiler.


Quote:
i've read articles about code optimization, but i wanted to know what optimization tips are still applicable in visual studio, especially that you can
change the project settings so that you check the Optimize Code in the IDE.

But what you just said is "asking the compiler", yet you think such an advice is disrespectful.

To conform to the rule: There is no general rule for optimization. The best is you get modern information about x86 code optimization. For that, links have already fallen in this thread. We could, btw, also just recommended you to rtfm.




The best choice is to apply all the optimization tips to
reach an efficient exe, even if some compilers will automatically
do some tips.

i think the books:
"C++ Footprint and Performance Optimization"
"Thinking in C++ Vol2"
are excellent ones for this.

Thank you all.

Share this post


Link to post
Share on other sites
Quote:
Original post by eGamer
The best choice is to apply all the optimization tips


Which in a commercial world will only work if you can spend 48hrs+, each day. Approximately.

Plus it will make your code unreadable, unmaintainable, unportable, you will heavily infringe the DRY principle, et cetera et cetera etc (hey, you said "all optimization tips, even if the compiler does that already").

One question: Do you even know what a profiler is?

Share this post


Link to post
Share on other sites
A profiler is a program that gives u the time each method has
taken to execute, besides it is usefull in showing u where is the
bottleneck method or function.

right ?

i meant "all the optimization tips", the way u are accustomed in writing
fluently efficient c++ code like:

int x = 40;
int y = x >> 2;

instead of:

int x = 40;
int y = x / 2;



or like:
void Vector::Normalize()
{
float _m = 1 / m_magnitude;
x *= _m;
y *= _m;
z *= _m;
}
instead of:

void Vector::Normalize()
{
x = x / m_magnitude;
y = y / m_magnitude;
z = z / m_magnitude;
}

etc...

i don't think that writing efficient code will require me 48 hours a day.
but efficient as far as i am fluent not 100% efficient.

Share this post


Link to post
Share on other sites
Quote:
Original post by eGamer
i meant "all the optimization tips", the way u are accustomed in writing
fluently efficient c++ code like:

int x = 40;
int y = x >> 2;

instead of:

int x = 40;
int y = x / 2;

A prime example of why you generally should write code that described intent, not code that obscures intent with a possibly more optimized code. Your optimized variant does not match the original one, and that is something that may not be obvious at first and may take some time to find. Although this may be a trivial example, it clearly demonstrates the principle how "clever tricks" can go wrong easily.

Share this post


Link to post
Share on other sites
Quote:
Original post by Brother Bob
Quote:
Original post by eGamer
i meant "all the optimization tips", the way u are accustomed in writing
fluently efficient c++ code like:

int x = 40;
int y = x >> 2;

instead of:

int x = 40;
int y = x / 2;

A prime example of why you generally should write code that described intent, not code that obscures intent with a possibly more optimized code. Your optimized variant does not match the original one, and that is something that may not be obvious at first and may take some time to find. Although this may be a trivial example, it clearly demonstrates the principle how "clever tricks" can go wrong easily.


Apart from that optimisation resulting in wrong code, micro (un)optimisations like that are often, very often (and in case of the examples you posted, always), already done by the compiler, and hence, as Brother Bob already pointed out, it's absolutely useless obfuscation.

Share this post


Link to post
Share on other sites
Quote:
Original post by Brother Bob
Quote:
Original post by eGamer
i meant "all the optimization tips", the way u are accustomed in writing
fluently efficient c++ code like:

int x = 40;
int y = x >> 2;

instead of:

int x = 40;
int y = x / 2;

A prime example of why you generally should write code that described intent, not code that obscures intent with a possibly more optimized code. Your optimized variant does not match the original one, and that is something that may not be obvious at first and may take some time to find. Although this may be a trivial example, it clearly demonstrates the principle how "clever tricks" can go wrong easily.



int x = 40;
int y = (x >> 1);

instead of:

int x = 40;
int y = x / 2;



also:
most of commercial used libraries has code that dosn't describe intents
but more optimized.

check qt macros ?
check string header file of Microsoft?
check ODE library math code ?
does it show intents?

it is optimized and only C++ Gurus can understand it.

Share this post


Link to post
Share on other sites
Quote:
Original post by eGamer
Quote:
Original post by Brother Bob
Quote:
Original post by eGamer
i meant "all the optimization tips", the way u are accustomed in writing
fluently efficient c++ code like:

int x = 40;
int y = x >> 2;

instead of:

int x = 40;
int y = x / 2;

A prime example of why you generally should write code that described intent, not code that obscures intent with a possibly more optimized code. Your optimized variant does not match the original one, and that is something that may not be obvious at first and may take some time to find. Although this may be a trivial example, it clearly demonstrates the principle how "clever tricks" can go wrong easily.



int x = 40;
int y = (x >> 1);

instead of:

int x = 40;
int y = x / 2;



also:
most of commercial used libraries has code that dosn't describe intents
but more optimized.

check qt macros ?
check string header file of Microsoft?
check ODE library math code ?
does it show intents?

it is optimized and only C++ Gurus can understand it.

Since you did the error when posting here, there's a chance your trick will be wrong in your code as well. But in your code, where it actually matters, there aren't any one around to point it out for you.

The examples you have play in a different league. The macros themselves may not describe intent, but the external interface, the one you as the user is supposed to see, probably does. The difference being that commercial libraries also have more backing up to notice errors in clever tricks. You, alone, may not.

Share this post


Link to post
Share on other sites
Quote:
Original post by eGamer
it is optimized and only C++ Gurus can understand it.


Let's ask the compiler [smile]




#include <cstdio>
#include <cmath>
int main () {
int x, y;
scanf ("%i %i", &x, &y);

asm volatile ("; y = x / 2;");
y = x / 2;
asm volatile ("; --");
printf ("%i %i", x, y);


asm volatile ("; y = x * 2;");
y = x * 2;
asm volatile ("; --");
printf ("%i %i", x, y);

asm volatile ("; y = x * 3;");
y = x * 3;
asm volatile ("; --");
printf ("%i %i", x, y);


asm volatile ("; y = x * 17;");
y = x * 17;
asm volatile ("; --");
printf ("%i %i", x, y);
}


Results with an old version of GCC, i.e. MinGW 3.x.


; y = x / 2;
movl -4(%ebp), %edx
movl %edx, %eax
sarl $31, %eax
shrl $31, %eax
leal (%edx,%eax), %eax
sarl %eax
movl %eax, -8(%ebp)

; y = x * 2;
movl -4(%ebp), %eax
addl %eax, %eax
movl %eax, -8(%ebp)

; y = x * 3;
movl -4(%ebp), %edx
movl %edx, %eax
addl %eax, %eax
addl %edx, %eax
movl %eax, -8(%ebp)

; y = x * 17;
movl -4(%ebp), %edx
movl %edx, %eax
sall $4, %eax
addl %edx, %eax
movl %eax, -8(%ebp)


Irrelevant parts cut out. As you see, no mul or div has been harmed, not even in the last statement.

As you see, the compiler replaces the last statement
y = x * 17;
automatically by
y = x<<4 + x;
.


Quote:
Qt, Microsoft, ...,

it is optimized and only C++ Gurus can understand it.


Hey, someone nearly got it. Now that you know that most microoptimizations are already done for you by the compiler, which sane reasoning remains to do it manually?

Share this post


Link to post
Share on other sites
Quote:
Original post by phresnel
Hey, someone nearly got it. Now that you know that most microoptimizations are already done for you by the compiler, which sane reasoning remains to do it manually?
Except it doesn't perform this particular micro-optimization automatically.
That is GCC produces a 5-instruction bithack to deal with rounding towards zero rather than the single arithmetic right shift a manual optimization would have yielded. Granted, perhaps the signed behavior was necessary and the shift optimization a bug, but I rather doubt it.

Share this post


Link to post
Share on other sites
Quote:
Original post by phresnel
Let's ask the compiler [smile]

There is more interesting stuff:

#include <cstdio>
#include <cmath>
int main () {
int x, y;
scanf ("%i %i", &x, &y);

asm volatile ("; y = x / 5;");
y = x / 5;
asm volatile ("; --");
printf ("%i %i", x, y);


asm volatile ("; y = x / 17;");
y = x / 17;
asm volatile ("; --");
printf ("%i %i", x, y);
}


Asm also with GCC:

; y = x / 5;

movl -8(%ebp), %ecx
movl $1717986919, %edx
movl %ecx, %eax
imull %edx
sarl %edx
movl %ecx, %eax
sarl $31, %eax
subl %eax, %edx
movl %edx, -12(%ebp)

; y = x / 17;

movl -8(%ebp), %ecx
movl $2021161081, %edx
movl %ecx, %eax
imull %edx
sarl $3, %edx
movl %ecx, %eax
sarl $31, %eax
subl %eax, %edx
movl %edx, -12(%ebp)


No divs! Compiler replaced division with multiplication with magic constant, shifting and sub. Try to optimize that manually ;)

Share this post


Link to post
Share on other sites
Quote:
Original post by bubu LV
Quote:
Original post by phresnel
Let's ask the compiler [smile]

There is more interesting stuff:
*** Source Snippet Removed ***

Asm also with GCC:

** snip **


No divs! Compiler replaced division with multiplication with magic constant, shifting and sub. Try to optimize that manually ;)


Yeah, really great stuff is happening. Latest GCC has just merged the Graphite Branch into main, which itself is a "a new framework for loop optimizations based on a polyhedral intermediate representation".

Initially I also wanted to include some example about CSE and how GCC will emit a fsincos x87 instruction for "x = sin(a); y = cos(b);", but then I lacked some minutes to share :D

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this