#### Archived

This topic is now archived and is closed to further replies.

# I need your opinion!!! ASM vs Plain C++

This topic is 5588 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I need your opinion or knowledge on a subject. The question in few words is : "Why my inline asm code, which is the same as the code that VC produce, is slower than the same without asm ?" Let me explain what i''m talking about. Yesterday, i found out that VC can export a listing file, with asm and source code in it, for a specific .c or .cpp file. So i compiled my Vector class with this option enabled, and i began to study the generated asm file. First of all, let me clear something. I am not an assembly guru or something, and the only things i know about assembly is how to do math calculations, and some other basic stuff (like searching a string for a char etc). That''s why i compiled the Vector class to study it. The first function i saw was the default constructor. This will be the func of my example. In plain c the constructor looked like this: GEVector3D::GEVector3D() { x = 0.0f; y = 0.0f; z = 0.0f; } When i compiled this thing, the asm output was: // Initialization stuff for the { ; 14 : x = 0; mov eax, DWORD PTR _this$[ebp] mov DWORD PTR [eax], 0 ; 15 : y = 0; mov ecx, DWORD PTR _this$[ebp] mov DWORD PTR [ecx+4], 0 ; 16 : z = 0; mov edx, DWORD PTR _this$[ebp] mov DWORD PTR [edx+8], 0 // deinitialization stuff for the } When i saw it, i reliazed that there was unnecessary mov instructions in it. So, i tried to rewrite it with inline asm, by calling only the necessary instructions. The result was this: GEVector3D::GEVector3D() { _asm { mov eax, DWORD PTR _this$[ebp] mov DWORD PTR [eax], 0 mov DWORD PTR [eax+4], 0 mov DWORD PTR [eax+8], 0 } } The generated listing file had the same init and uninit stuff for the two function versions. So i started to rewrite the whole class with inline asm. Then i made some kind of benchmark. It was a big loop, inside of which, there were some vector initialization (constructor), and some vector math like dot products and cross products, normalization etc. When i timed the loop with the inline asm version of code, it outputs: "Averange time for 10000 loops of 100 vectors calcs : 1.2 sec" and this was in the release version, with inline func substitution enabled. When i timed the same loop, with the same code for the calculations and for the timing, with the plain c code, the "Averange time for 10000 loops of 100 vectors calcs : 0.3 sec", always in the release version. How can it be explained?????? And if i am making some kind of mistake, in big asm functions (cross products, normalization etc.), why it is still slower, if i use only the small and confident functions(like constructors etc.)???????? I need your opinion on that. Because i''m trying to make my apps run faster, i need some advice on it. Thanks in advance for your attention, and forgive my terrible english. HellRaiZer HellRaiZer

##### Share on other sites
Maybe each time you _asm the compiler pushes all vars, then pops them once you''re done to restore status.

ToohrVyk
-------------
Extatica - a free 3d game engine
Available soon!

##### Share on other sites
Visual C++''s method of setting 3 variables to zero was too slow for you? Yoiks.

If you want your applications to run faster, then profile it first and find out where you real bottlenecks are. Most good optimisations dont require assembly language at any rate.

##### Share on other sites
No, setting 3 vars to 0.0f is not too slow for me!!!!! But what about cross products? When you have 1000 of them in a single frame for example??? Are these two extra movs a problem????

I know that asm is not always the best optimization, but the thing is that i am tring to learn something from it so, i experiment!!!!

HellRaiZer

##### Share on other sites
Have you tried playing around with the SSE/SSE2 intrinsics in tthe new (MS) C++ compiler?

Regards

Thomas Tomiczek
THONA Consulting Ltd.
(Microsoft MVP C#/.NET)

##### Share on other sites
I have an ADM Duron 700, so i can''t use SSE/SSE2. But i searched for 3DNow! instructions tuts, and i found nothing. If you have some thing to prefer, i''d be happy to hear it. And i''m using VC 6.0, so i have to download 70Megs for the appropriate "patch" for my compiler!!!

Thanks again.

HellRaiZer

##### Share on other sites
Ýou are seriously underestimating the VC optimizer.

Even though the constructor and other functions in your vector class may look silly, in most cases it will replace calls to them with inline code, specifically optimized to the current situation.

Skip this way of "optimizing", it''s just no gain.

On the other hand, if you have a big loop doing _LOTS_ of vector processing, in some cases you may gain some speed by rewriting the entire loop in asm, or gain even more by using SSE, but rewriting little functions like this just makes the optimizer''s job harder.

##### Share on other sites
Generally unless you''re amazing at asm the compiler can do a much better optimzation job than you can. And as another poster said don''t optimize till you''ve found the bottle necks using the profiler. This will save you time and keep your code readable

##### Share on other sites
Thanks for the replies.

It was just an experiment, and i just shared with you my results. It is not that i am going to write, for example, a BSP algorithm in asm. I am hust telling you what i figured out.

Conclusion: Let the compiler optimize the code for you. I got it.
Thanks again.

PS: Speaking about bottlenecks, i profiled my Particle Editor. I let it run for 10 sec on the first screen, without doing anything, just i let it render the GUI (full screen OpenGL with custom GUI, not Window''s GUI), the axis and the grid. After 10 secs, i read the profiler''s results. The damn Font class took 45% of the whole time. Is there some to make the font rendering faster in OpenGL?I know this has nothing to do with the post but...
And i only render the FPS, and two other strings!!!!! And it is texture font. Not raster-bitmap or 3d font.TEXTURE FONT!!!!!
I must make a real mistake on that!!!!!

##### Share on other sites
The compiler will pair instructions to make the best use of the pipeline. Those movs you found unnecessary are infact not. They get executed at the same time becuase none of the relies on the other. A faster way to clear a variable is xor eax,eax but this is very very very slight improvement and not very useful in todays machines. A 486 is a different matter. Many times assembler programmers think about old optimization tricks but really are not up to date with how todays cpu''s really work. Trying to outdo the compiler on a instruction by instruction basis is virtually impossible today. You can however beat the compiler by writing whole functions from scratch in asm without too much effort and by making use of the cpu extensions. But then you have the issue about brands. It really is hard work to write code better than the compiler and I would suggest you spend your time on developing your application in c++ instead. But if you really want to do this you must go to Intels homepage and download all related material as well as at AMD. Then sit down and look through the problem closely and then write paths for both cpu brands (this will take time). Don''t forget to compile your asm code by it self in an asm file absolutely not as inline since the resulting code will be degraded by the compiler not being able to do its best. Read the docs on how to compile an asm file.

• 10
• 19
• 14
• 19
• 15