# Code Optimization in visual studio

This topic is 3184 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

given the following code:
void Quat_To_Matrix1(Quaternion& quat,Matrix& matrix)
{
const float x = quat[0];
const float y = quat[1];
const float z = quat[2];
const float w = quat[3];

matrix[0][0] = w*w + x*x - y*y - z*z;
matrix[0][1] = 2*x*y + 2*w*z;
matrix[0][2] = 2*x*z - 2*w*y;
matrix[0][3] = 0.0f;

matrix[1][0] = 2*x*y-2*w*z;
matrix[1][1] = w*w - x*x + y*y - z*z;
matrix[1][2] = 2*y*z + 2*w*x;
matrix[1][3] = 0.0f;

matrix[2][0] = 2*x*z + 2*w*y;
matrix[2][1] = 2*y*z - 2*w*x;
matrix[2][2] = w*w - x*x - y*y + z*z;
matrix[2][3] = 0.0f;

matrix[3][0] = 0.0f;
matrix[3][1] = 0.0f;
matrix[3][2] = 0.0f;
matrix[3][3] = w*w + x*x + y*y + z*z;
}

void Quat_To_Matrix2(Quaternion& quat,Matrix& matrix)
{
const float x = quat[0];
const float y = quat[1];
const float z = quat[2];
const float w = quat[3];

float _w = w*w;
float _x = x*x;
float _y = y*y;
float _z = z*z;

matrix[0][0] = _w + _x - _y - _z;
matrix[0][1] = 2*x*y + 2*w*z;
matrix[0][2] = 2*x*z - 2*w*y;
matrix[0][3] = 0.0f;

matrix[1][0] = 2*x*y-2*w*z;
matrix[1][1] = _w - _x + _y - _z;
matrix[1][2] = 2*y*z + 2*w*x;
matrix[1][3] = 0.0f;

matrix[2][0] = 2*x*z + 2*w*y;
matrix[2][1] = 2*y*z - 2*w*x;
matrix[2][2] = _w - _x - _y + _z;
matrix[2][3] = 0.0f;

matrix[3][0] = 0.0f;
matrix[3][1] = 0.0f;
matrix[3][2] = 0.0f;
matrix[3][3] = _w + _x + _y + _z;
}

Suppose i am using Visual Studio 2008, is there any need to the optimization in the second function ? i mean do visual studio reaches an assembly in function one that is optimized without making the manual optimization in the second function ? [Edited by - ApochPiQ on June 3, 2009 8:56:02 AM]

##### Share on other sites
Generally it's better to ask the compiler if it does an optimization rather than a human being. You can use the /FA family of switches to get MSVC to output the assembly it generates for a given C++ file.

##### Share on other sites
Measure it with a profiler. If your program does not spend the majority of its time in this function, the answer to "do I need to optimise it" is unequivocally no.

##### Share on other sites
Quote:
 Original post by SiCraneGenerally it's better to ask the compiler if it does an optimization rather than a human being. You can use the /FA family of switches to get MSVC to output the assembly it generates for a given C++ file.

i am asking this question to know the level of optimization we have to do manually
and what are the issues modern c++ compilers take care of.

the Code Generator of Maple generates function two, while i saw function one
in very large free source projects that are realy reliable.

##### Share on other sites
Quote:
 Original post by eGamerplease show some respect in your answers, otherwise it is useless in this forum.

...

On the other hand, let me give you a very helpful link anyways: click. And as it turns out, there appear all the links I wanted to recommend.

edit: Out of curiosity, where exactly was someone disrespectful? I fail to find that.

##### Share on other sites
Quote:
 Original post by eGamerplease show some respect in your answers, otherwise it is useless in this forum.

I have no idea how it is that you sense any disrespect here.

##### Share on other sites
Quote:
 Original post by eGameri mean do visual studio reaches an assembly in function one that is optimized without making the manual optimization in the second function ?

Both functions are completely unoptimized.

You first need to rewrite the code as SIMD (SSE - SSE4, depending on target platform) in assembly, then you need to run a decent profiler, to determine if instruction scheduling works as expected, or whether there are any other issues. Of course, profile against non-SIMD version, it might be faster.

In addition, the data needs to be properly aligned and prefetched in cache, and the loop calling it must be either cache oblivious or take into consideration cache line sizes.

Lastly, you need to profile the whole application to determine if bottlenecks occur elsewhere, whether inlining the function is possible or beneficial, and whether the rest of application doesn't introduce unexpected bottlenecks.

##### Share on other sites
Your manual optimizations are (usually) unnecessary, a decent compiler will perform them, or at least consider them. It's called common sub-expression elimination; expressions like "x*x" and "2*x*y" can be calculated once and used in other expressions.

The opposite is rematerialization; sometimes you don't want to do too much CSE if you don't have enough registers to store all the CSEs and don't want to spill to memory, so you recalculate expressions because it's cheaper than keeping them around.

##### Share on other sites
Quote:
 Original post by outRiderYour manual optimizations are (usually) unnecessary, a decent compiler will perform them, or at least consider them. It's called common sub-expression elimination; expressions like "x*x" and "2*x*y" can be calculated once and used in other expressions.
Be careful with such assumptions.
C compilers often refuse to do any kind of algebraic transformation on floating point expressions, so I don't expect it to work for things that aren't exactly identical. That is it may well fail to expose certain possible common sub-expressions within the sums (e.g. things like x - y and 2 + x - y) .

The usual aliasing issues apply here too of course. Caching the quaternion members (as the OP is doing here) is a good idea for reasons other than saving on typing.

##### Share on other sites
Quote:
Original post by implicit
Quote:
 Original post by outRiderYour manual optimizations are (usually) unnecessary, a decent compiler will perform them, or at least consider them. It's called common sub-expression elimination; expressions like "x*x" and "2*x*y" can be calculated once and used in other expressions.
Be careful with such assumptions.
C compilers often refuse to do any kind of algebraic transformation on floating point expressions, so I don't expect it to work for things that aren't exactly identical. That is it may well fail to expose certain possible common sub-expressions within the sums (e.g. things like x - y and 2 + x - y) .

Hence the "usually" and "at least consider them" qualifiers. I'm aware of the FP complications. Most compilers I've used have a switch for fast vs precise FP consistency and some use fast at high opt.

Quote:
 Original post by implicitThe usual aliasing issues apply here too of course. Caching the quaternion members (as the OP is doing here) is a good idea for reasons other than saving on typing.

Aliasing is irrelevant to the OPs question, the difference between the two snippets is the explicit calculation of the squared sub-expressions, all the variables are pinned. But yes, in general aliasing is a factor in correctly identifying CSEs and invariants.

##### Share on other sites
Quote:
Original post by outRider
Quote:
Original post by implicit
Quote:
 Original post by outRiderYour manual optimizations are (usually) unnecessary, a decent compiler will perform them, or at least consider them. It's called common sub-expression elimination; expressions like "x*x" and "2*x*y" can be calculated once and used in other expressions.
Be careful with such assumptions.
C compilers often refuse to do any kind of algebraic transformation on floating point expressions, so I don't expect it to work for things that aren't exactly identical. That is it may well fail to expose certain possible common sub-expressions within the sums (e.g. things like x - y and 2 + x - y) .

Hence the "usually" and "at least consider them" qualifiers. I'm aware of the FP complications. Most compilers I've used have a switch for fast vs precise FP consistency and some use fast at high opt.

Quote:
 Original post by implicitThe usual aliasing issues apply here too of course. Caching the quaternion members (as the OP is doing here) is a good idea for reasons other than saving on typing.

Aliasing is irrelevant to the OPs question, the difference between the two snippets is the explicit calculation of the squared sub-expressions, all the variables are pinned. But yes, in general aliasing is a factor in correctly identifying CSEs and invariants.

Also note that operand reordering is skipped in an IEEE conforming compiler, which also means that tons of constant-folding may be skipped, which means operand reordering might be skipped, which means ..... .

GCC has -fassociative-math to enable reordering where the result may vary due to that reordering. Dunno about VC.

Quote:
 It's called common sub-expression elimination

Oh, that is just one of many, many possible optimization passes. Also, CSE is quite difficult (read: mostly impossible) for function calls whose callee's definition is invisible to the compiler (which for most functions in C and C++ is the case), to be sure, make the functions inline, or as a special option, use whole program optimization (to help the compiler with that, you could provide an amalgamized version of your program code).

Note that in GCC there are the function attributes "const" and "pure", with which you can promise the compiler that two consequtive calls to a function with the same arguments will yield the exact same return value, and that calling them will have no side effects. Also note that that attribute const is not related to C++ const keyword.

For a glimpse about existing optimization passes, have a look at this very incomplete overview: clicky.

##### Share on other sites
Quote:
Original post by Zahlman
Quote:
 Original post by eGamerplease show some respect in your answers, otherwise it is useless in this forum.

I have no idea how it is that you sense any disrespect here.

do you think: "Ask The Compiler?" is a kind of respect?
so why do i ask C++ Gurus here in this forum ?
i've read articles about code optimization, but i wanted to know what optimization
tips are still applicable in visual studio, especially that you can
change the project settings so that you check the Optimize Code in the IDE.

by the way my mother language is French and may be i misunderstood his answer.

##### Share on other sites
Quote:
Original post by eGamer
Quote:
Original post by Zahlman
Quote:
 Original post by eGamerplease show some respect in your answers, otherwise it is useless in this forum.

I have no idea how it is that you sense any disrespect here.

do you think: "Ask The Compiler?" is a kind of respect?

But it is generally better. To see which code will be optimized better, well, check the code the compiler emits. There is no general answer to the question which code is better.

Btw, apart from asking the compiler, don't forget to ask the profiler.

Quote:
 i've read articles about code optimization, but i wanted to know what optimization tips are still applicable in visual studio, especially that you canchange the project settings so that you check the Optimize Code in the IDE.

But what you just said is "asking the compiler", yet you think such an advice is disrespectful.

To conform to the rule: There is no general rule for optimization. The best is you get modern information about x86 code optimization. For that, links have already fallen in this thread. We could, btw, also just recommended you to rtfm.

##### Share on other sites
Quote:
 Original post by eGamerdo you think: "Ask The Compiler?" is a kind of respect?so why do i ask C++ Gurus here in this forum ? i've read articles about code optimization, but i wanted to know what optimizationtips are still applicable in visual studio, especially that you canchange the project settings so that you check the Optimize Code in the IDE.

And that's exactly why SiCrane told you to ask the compiler. The compiler is the one who knows best whether or not it applied optimizations. People here can only speculate. The compiler knows it. So ask him by looking at the generated assembly.

In the future, I would suggest you better read the provided answers before calling someone disrespectful.

##### Share on other sites
Quote:
Original post by phresnel
Quote:
Original post by eGamer
Quote:
Original post by Zahlman
Quote:
 Original post by eGamerplease show some respect in your answers, otherwise it is useless in this forum.

I have no idea how it is that you sense any disrespect here.

do you think: "Ask The Compiler?" is a kind of respect?

But it is generally better. To see which code will be optimized better, well, check the code the compiler emits. There is no general answer to the question which code is better.

Btw, apart from asking the compiler, don't forget to ask the profiler.

Quote:
 i've read articles about code optimization, but i wanted to know what optimization tips are still applicable in visual studio, especially that you canchange the project settings so that you check the Optimize Code in the IDE.

But what you just said is "asking the compiler", yet you think such an advice is disrespectful.

To conform to the rule: There is no general rule for optimization. The best is you get modern information about x86 code optimization. For that, links have already fallen in this thread. We could, btw, also just recommended you to rtfm.

The best choice is to apply all the optimization tips to
reach an efficient exe, even if some compilers will automatically
do some tips.

i think the books:
"C++ Footprint and Performance Optimization"
"Thinking in C++ Vol2"
are excellent ones for this.

Thank you all.

##### Share on other sites
Quote:
 Original post by eGamerThe best choice is to apply all the optimization tips

Which in a commercial world will only work if you can spend 48hrs+, each day. Approximately.

Plus it will make your code unreadable, unmaintainable, unportable, you will heavily infringe the DRY principle, et cetera et cetera etc (hey, you said "all optimization tips, even if the compiler does that already").

One question: Do you even know what a profiler is?

##### Share on other sites
A profiler is a program that gives u the time each method has
taken to execute, besides it is usefull in showing u where is the
bottleneck method or function.

right ?

i meant "all the optimization tips", the way u are accustomed in writing
fluently efficient c++ code like:

int x = 40;
int y = x >> 2;

int x = 40;
int y = x / 2;

or like:
void Vector::Normalize()
{
float _m = 1 / m_magnitude;
x *= _m;
y *= _m;
z *= _m;
}

void Vector::Normalize()
{
x = x / m_magnitude;
y = y / m_magnitude;
z = z / m_magnitude;
}

etc...

i don't think that writing efficient code will require me 48 hours a day.
but efficient as far as i am fluent not 100% efficient.

##### Share on other sites
Quote:
 Original post by eGameri meant "all the optimization tips", the way u are accustomed in writingfluently efficient c++ code like:int x = 40;int y = x >> 2;instead of:int x = 40;int y = x / 2;

A prime example of why you generally should write code that described intent, not code that obscures intent with a possibly more optimized code. Your optimized variant does not match the original one, and that is something that may not be obvious at first and may take some time to find. Although this may be a trivial example, it clearly demonstrates the principle how "clever tricks" can go wrong easily.

##### Share on other sites
Quote:
Original post by Brother Bob
Quote:
 Original post by eGameri meant "all the optimization tips", the way u are accustomed in writingfluently efficient c++ code like:int x = 40;int y = x >> 2;instead of:int x = 40;int y = x / 2;

A prime example of why you generally should write code that described intent, not code that obscures intent with a possibly more optimized code. Your optimized variant does not match the original one, and that is something that may not be obvious at first and may take some time to find. Although this may be a trivial example, it clearly demonstrates the principle how "clever tricks" can go wrong easily.

Apart from that optimisation resulting in wrong code, micro (un)optimisations like that are often, very often (and in case of the examples you posted, always), already done by the compiler, and hence, as Brother Bob already pointed out, it's absolutely useless obfuscation.

##### Share on other sites
Quote:
Original post by Brother Bob
Quote:
 Original post by eGameri meant "all the optimization tips", the way u are accustomed in writingfluently efficient c++ code like:int x = 40;int y = x >> 2;instead of:int x = 40;int y = x / 2;

A prime example of why you generally should write code that described intent, not code that obscures intent with a possibly more optimized code. Your optimized variant does not match the original one, and that is something that may not be obvious at first and may take some time to find. Although this may be a trivial example, it clearly demonstrates the principle how "clever tricks" can go wrong easily.

int x = 40;
int y = (x >> 1);

int x = 40;
int y = x / 2;

also:
most of commercial used libraries has code that dosn't describe intents
but more optimized.

check qt macros ?
check string header file of Microsoft?
check ODE library math code ?
does it show intents?

it is optimized and only C++ Gurus can understand it.

##### Share on other sites
Quote:
Original post by eGamer
Quote:
Original post by Brother Bob
Quote:
 Original post by eGameri meant "all the optimization tips", the way u are accustomed in writingfluently efficient c++ code like:int x = 40;int y = x >> 2;instead of:int x = 40;int y = x / 2;

A prime example of why you generally should write code that described intent, not code that obscures intent with a possibly more optimized code. Your optimized variant does not match the original one, and that is something that may not be obvious at first and may take some time to find. Although this may be a trivial example, it clearly demonstrates the principle how "clever tricks" can go wrong easily.

int x = 40;
int y = (x >> 1);

int x = 40;
int y = x / 2;

also:
most of commercial used libraries has code that dosn't describe intents
but more optimized.

check qt macros ?
check string header file of Microsoft?
check ODE library math code ?
does it show intents?

it is optimized and only C++ Gurus can understand it.

Since you did the error when posting here, there's a chance your trick will be wrong in your code as well. But in your code, where it actually matters, there aren't any one around to point it out for you.

The examples you have play in a different league. The macros themselves may not describe intent, but the external interface, the one you as the user is supposed to see, probably does. The difference being that commercial libraries also have more backing up to notice errors in clever tricks. You, alone, may not.

##### Share on other sites
Quote:
 Original post by eGamerit is optimized and only C++ Gurus can understand it.

#include <cstdio>#include <cmath>int main () {        int x, y;        scanf ("%i %i", &x, &y);        asm volatile ("; y = x / 2;");        y = x / 2;        asm volatile ("; --");        printf ("%i %i", x, y);        asm volatile ("; y = x * 2;");        y = x * 2;        asm volatile ("; --");        printf ("%i %i", x, y);        asm volatile ("; y = x * 3;");        y = x * 3;        asm volatile ("; --");        printf ("%i %i", x, y);        asm volatile ("; y = x * 17;");        y = x * 17;        asm volatile ("; --");        printf ("%i %i", x, y);}

Results with an old version of GCC, i.e. MinGW 3.x.

	; y = x / 2;	movl	-4(%ebp), %edx	movl	%edx, %eax	sarl	$31, %eax shrl$31, %eax	leal	(%edx,%eax), %eax	sarl	%eax	movl	%eax, -8(%ebp)	; y = x * 2;	movl	-4(%ebp), %eax	addl	%eax, %eax	movl	%eax, -8(%ebp)        ; y = x * 3;	movl	-4(%ebp), %edx	movl	%edx, %eax	addl	%eax, %eax	addl	%edx, %eax	movl	%eax, -8(%ebp)        ; y = x * 17;	movl	-4(%ebp), %edx	movl	%edx, %eax	sall	$4, %eax addl %edx, %eax movl %eax, -8(%ebp) Irrelevant parts cut out. As you see, no mul or div has been harmed, not even in the last statement. As you see, the compiler replaces the last statement y = x * 17; automatically by y = x<<4 + x; . Quote:  Qt, Microsoft, ...,it is optimized and only C++ Gurus can understand it. Hey, someone nearly got it. Now that you know that most microoptimizations are already done for you by the compiler, which sane reasoning remains to do it manually? #### Share this post ##### Link to post ##### Share on other sites Quote:  Original post by phresnelHey, someone nearly got it. Now that you know that most microoptimizations are already done for you by the compiler, which sane reasoning remains to do it manually? Except it doesn't perform this particular micro-optimization automatically. That is GCC produces a 5-instruction bithack to deal with rounding towards zero rather than the single arithmetic right shift a manual optimization would have yielded. Granted, perhaps the signed behavior was necessary and the shift optimization a bug, but I rather doubt it. #### Share this post ##### Link to post ##### Share on other sites Quote:  Original post by phresnelLet's ask the compiler [smile] There is more interesting stuff: #include <cstdio>#include <cmath>int main () { int x, y; scanf ("%i %i", &x, &y); asm volatile ("; y = x / 5;"); y = x / 5; asm volatile ("; --"); printf ("%i %i", x, y); asm volatile ("; y = x / 17;"); y = x / 17; asm volatile ("; --"); printf ("%i %i", x, y);} Asm also with GCC:  ; y = x / 5; movl -8(%ebp), %ecx movl$1717986919, %edx	movl	%ecx, %eax	imull	%edx	sarl	%edx	movl	%ecx, %eax	sarl	$31, %eax subl %eax, %edx movl %edx, -12(%ebp) ; y = x / 17; movl -8(%ebp), %ecx movl$2021161081, %edx	movl	%ecx, %eax	imull	%edx	sarl	$3, %edx movl %ecx, %eax sarl$31, %eax	subl	%eax, %edx	movl	%edx, -12(%ebp)

No divs! Compiler replaced division with multiplication with magic constant, shifting and sub. Try to optimize that manually ;)

##### Share on other sites
Quote:
Original post by bubu LV
Quote:
 Original post by phresnelLet's ask the compiler [smile]

There is more interesting stuff:
*** Source Snippet Removed ***

Asm also with GCC:
** snip **

No divs! Compiler replaced division with multiplication with magic constant, shifting and sub. Try to optimize that manually ;)

Yeah, really great stuff is happening. Latest GCC has just merged the Graphite Branch into main, which itself is a "a new framework for loop optimizations based on a polyhedral intermediate representation".

Initially I also wanted to include some example about CSE and how GCC will emit a fsincos x87 instruction for "x = sin(a); y = cos(b);", but then I lacked some minutes to share :D