Sign in to follow this  
obhi

Choosing between inline assembly and full assembly compilation

Recommended Posts

hi, i've been using inline assembly for math code for a very long time now and the library has grown pretty fat. The library includes support for SSE/SSE2 and fallback C emulation for vector processing. Recently i stumbled upon a book on vector processing which uses thorough assembly for its vector processing codes written to be compiled by MASM. Now that makes me think, should I rewrite my inline assembly codes to normal assembly considering there could be unnecessary saving and restoring of registers before every inline assembly block? That will be a lot of work! Anyway so my question is Is inline assembly comparable to full assembly compilations. I tried google but (it) doesn't seem to be answering this question straight forwardly. On a side note i would like to know if stack elements can be aligned my using align keyword found in C compilers (like __declspec(align(16))) or should they be explicitly aligned using macros. And is the first method consistent considering the stack pointer. thanks for any help.

Share this post


Link to post
Share on other sites
This is a hard question with no easy answer.

High performance math libraries require you to carefully measure and test their performance in many ways.


It looks as though you may know this already, but it bears repeating often. Simply replacing math functions with SIMD assembly instructions may result in a performance penalty rather than speedup. You must measure before and after and choose the best one. Further, in different programming situations the performance will change.

Math libraries also need to be aware of the cost of calling a function. There is a cost to calling a function, more than just the extra instructions to prepare for it and clean up from the result. The CPU is deeply pipelined and has many caches. Depending on the situation, function calls can stall the pipeline until values are retired, and they increase the number of cache misses. Both will hurt performance.

But used correctly these CPU features can improve performance far more than the cost of calling them.

So there is no simple answer to the question. You must measure your own code and decide on a case-by-case basis.


>> should I rewrite my inline assembly codes to normal assembly considering there could be unnecessary saving and restoring of registers before every inline assembly block? That will be a lot of work!

That is one of many drawbacks to using inline assembly. To make matters worse, the optimizer may ignore your inline assembly blocks for some or all optimizations.

Have you looked in to compiler intrinsic functions? They allow you to use the functions while avoiding most of the problems of inline assembly.


>> Is inline assembly comparable to full assembly compilations?

No. You already pointed out a few differences, but there are other sources of overhead including the cost of function calls.

>> On a side note i would like to know if stack elements can be aligned my using align keyword found in C compilers (like __declspec(align(16))) or should they be explicitly aligned using macros. And is the first method consistent considering the stack pointer.

That is a compiler-specific extension. You must look at both the compiler documentation and output to be certain.

Share this post


Link to post
Share on other sites
Quote:
Original post by frob
Have you looked in to compiler intrinsic functions? They allow you to use the functions while avoiding most of the problems of inline assembly.

Hearken to him. Intrinsics are not only more comfortable and take care of the dirty details, but they're superior in every way. They'll let the compiler run full optimisations across and inside all your code, which is not possible with either inline assembly or asm files.

Share this post


Link to post
Share on other sites
Yeah! in fact my library has 3 versions of every function asm, intrinsic and C. But i haven't got a chance to do tuning. May be i will use VTUNE soon.
Well may be intrinsics do allow compiler optimizations but often when you look at the disassembly (or the assembly generated by them) I start doubting them. They generate more instructions than needed. May be its because of the debug build. Currently both c and asm are disabled in my library only intrinsics are being used. But at times i feel going back to asm.
Can i call for a vote based on only sheer speed how many will support intrinsics over asm.
P.S.: Hope I'm correct about the fact that intrinsics do generate different code for the same sequence of instructions written in asm (and intrinsics). (But I'll be happy if I am wrong. It will save me a lot of work :))

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this