[C++] how to recompile the program with optimized asm code

Started by
6 comments, last by frob 14 years, 5 months ago
Hi, I was doing program optimization in visual studio 2008. There is a program i need to manually optimized at the assembly level. First, i got the VS compiler generated disassembly code (extension .asm) in visual studio. Then i manually change the content of the disassembly code for the sake of optimization. Now, the problem is that i do not know how to recompile the program using the optimized MS disassembly code. Do i need to decompile the optimized disassembly code to a C++ code before compilation?? if so, how? if not, what should i do to get the optimized program to run? thx very much
Advertisement
If it is just a section of assembly, you can put it inline:

void f() {    __asm {        nop             }}


If it's an entire file, you can add it to your project, and create a custom build step for it (if a rule isn't already defined for assembly files), to build it using the assembler provided with MSVS (ml.exe). See here.
Starting with the assembly that was generated would be rather difficult because it is full of relative stack addresses etc instead of variable references. "[ebp+24h]" etc will be the equivalent of "myvariable". So to do this with anything but a tiny piece of code would be very difficult and tedious.
The next point is that all high-level optimisations should be considered before writing asm.

Lets face it, you'd prefer your code to be many times faster than it is without having to do any assembly optimisation yourself right? Who wouldn't! You're convinced that the C++ code can't be optimised by hand by any significant amount further right? Guess what, you're almost guaranteed to be wrong there. To presume that nobody else out there knows a way of doing what you're doing in a significantly faster way would be extremely naieve...

One way to tune the code is to make a change to the C++ code and then examine the resulting asm. By going through several iterations of this you can get much faster code without resorting to writing asm and all the downsides that come with it, if you know how to go about it.

Say you've decided that you need to use SIMD instructions and therefore need assembly code right? Wrong! Most compilers have functions called instrisics that give you the assembly code benefits without most of the downfalls.

You'll probably find that your only option is to first try and convince us of your need to use asm. You'll be asked for profiling information that shows where your bottlenecks are etc. If you haven't provided this then many people will insist on it. Why do most of us react this way? Experience shows that it usually gives a better outcome for people such as yourself.
If your goal is speed, then let us help you with that goal, without forcing yourself into any nasty assembly programming.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
plug.
plug.
plug.

Summary:

The compiler has more optimization experience than most of us (including me).


If you can't answer e.g. the following questions without looking at wikipedia or what, than inline assembler is not for you (the compiler can):
* what is instruction scheduling for?
* can function inlining be bad for performance?
* what is loop unswitching? loop interchanging?
* how can you reduce latency in that expression: a = (b+c)/2


I really don't want to sound harsh. Being able to read assembler is great when you want to "read" optimizations done by the compiler. But applying them yourself is generally a dead end approach for most human in the modern world.
At one time, doing it all yourself was alright. The output of VC++ 6 was awful. Having looked at VC++ 2005's optimized output, there's nothing that you could really improve upon from what I've seen. They avoid stalls, unnecessary shifting between registers and memory etc. Not really seen how good it is at eliminating branching, but I'm sure they have it all covered.

Note that this only applies to the optimized output; if you're looking at the normal ASM generation, it's going to have problems. Turn up the optimization as much as possible and see if you're still dissatisfied with its output, then come back.

[edit]

One thing that I've never seen is the compiler doing loop unrolling, but you can do that in the C/++ code itself anyway.

You might want to consider changing to the Intel C++ compiler which supposedly has amazing optimization techniques (including unrolling) and you can get it to work with Visual Studio (or you could do once upon a time).
Quote:Original post by Guinnie
One thing that I've never seen is the compiler doing loop unrolling
I haven't ever bothered to check MSVC, but GCC does extensive loop *and* recursion unrolling at -O3.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Quote:Original post by Guinnie
One thing that I've never seen is the compiler doing loop unrolling, but you can do that in the C/++ code itself anyway.


Note that in my other post above, there were links to examples where the compiler unrolled and vectorized and whatnot optimized the code (g++).
Quote:Original post by swiftcoder
Quote:Original post by Guinnie
One thing that I've never seen is the compiler doing loop unrolling
I haven't ever bothered to check MSVC, but GCC does extensive loop *and* recursion unrolling at -O3.

It does, but the compiler is less aggressive at all optimizations than GCC, icc or other optimizing-centric compilers.

This topic is closed to new replies.

Advertisement