Sign in to follow this  
Finalspace

Is there a inline assembler overhead?

Recommended Posts

Yesterday i had a discussion with a colleage about inline assembly and how cpu´s handles memory and such.

He is a i dont care about memory/performance whatsoever person, so its hard to get that stuff through...

Anyway we got to the topic of inline assembly in C and other languages like Delphi and he stated that:

There is always a overhead when you switch from "normal code built from the compiler" and "code which is inline assembly" -> The full registers must be fully copied and cleared before the inline assembly instructions are executed, so you can use all registers like "eax", "ebx", etc. however you want. Also there is no overhead when the "compiler" does his thing...

 

I was like, this cannot be true! I guess the inline assembly part have the same initial register state than the normal code would have, so there is literally no overhead? I am not an expert on assembly programming, so i was unsure if this is true or not.

 

The inline assembly part fully replaced the code what the compiler would have produced in the first place right and you must know which registers you can use and which you dont right?

 

My assembly knowledge is just enough to inspect some states in my program, but nothing more - so i cannot really verify that statement, but i am sure there are great experts here which can help me understand that things much better, and give my colleage a proper answer.

 

Thanks

Edited by Finalspace

Share this post


Link to post
Share on other sites

Depends on your compiler. He's right that for many compilers, yep, they make zero assumptions about what your inline asm is going to do, so they have to do a bunch of book-keeping work on either side of your asm block, and also cannot perform any optimizations between "their" asm and "your" asm.

Hopefully a modern compiler will throw all the asm together and optimize it seamlessly.

Share this post


Link to post
Share on other sites

gcc will blend in that asm seamlessly into c/c++ code.

See https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

You can specify inputs/outputs, clobbered registers,  source/input location restrictions, etc. Then gcc does its magic. Gnu asm will even reorder asm commands, unless you mark that inline block as volatile.

Share this post


Link to post
Share on other sites
For some compilers (not GCC -- unless you clobber "memory", which you must do explicitly) it's a full load/store barrier in any case, even if there is no other "setup" happening. That's usually a rather minor thing, though.

Optimizations do in theory not happen across or inside asm blocks, but in practice at least register coloring and instruction scheduling works perfectly well (... with GCC and clang at least, couldn't tell about MS).

You should however assume that no high-level optimizations such as CSE or invariant moves and unrolling happen, since the compiler cannot easily tell what's going on inside that black box at the time it's looking at the AST. What finally happens in the backend/assembler is a different thing, so unsurprisingly those micro-optimizations still work.

Share this post


Link to post
Share on other sites

o if i understand that correctly, your asm code is most likely not touched by the compiler optimization at all - except you use extended assembler which automatically converts "named local variables" into register names like EAX, EBX etc. and some compilers may have some book-keeping work to do - but with minimal runtime overhead right?

 

Anyway thanks - now its a bit more clear, but i should double-check how VC++ handles that stuff - because this is the compiler of our/my choice.

Edited by Finalspace

Share this post


Link to post
Share on other sites

i should double-check how VC++ handles that stuff - because this is the compiler of our/my choice.

MSVC only lets you to use inline asm for x86, it isn't supported for ARM or x64 targets at all. (https://msdn.microsoft.com/en-us/library/4ks26t93.aspx) So even if you find a case, where you absolutely need to use inline asm, MSVC probably won't really cooperate with you.

Share this post


Link to post
Share on other sites

 

i should double-check how VC++ handles that stuff - because this is the compiler of our/my choice.

MSVC only lets you to use inline asm for x86, it isn't supported for ARM or x64 targets at all. (https://msdn.microsoft.com/en-us/library/4ks26t93.aspx) So even if you find a case, where you absolutely need to use inline asm, MSVC probably won't really cooperate with you.

 

That's a good thing in my view.  Inline assembly is a landmine waiting to explode. It isn't optimized by the compiler, although they may or may not put barriers around it. Something that was good this year becomes a nightmare next year, and there is no clear indication why.

When we have functions that absolutely must be in assembly language -- and that number is extremely small on modern systems -- the assembly code goes into it's own little library. It generates its own little object file that gets linked in just like other library systems.

Use compiler intrinsics where you can. You may be surprised to first read the full list of intrinsic functions, the list is quite long. Some are shocked to discover many standard library functions are quietly replaced by the compiler with intrinsic operations rather than standard library calls.  Most math functions, many memory and string functions, and several bitwise operations are all replaced by compiler intrinsics rather than being standard library calls. There is really no point to setting up a stack frame, passing the parameters to the cos() function, letting it run, and retrieving the results when the compiler can just call the instruction directly.  That is in addition to all the SIMD or advanced/quirky hardware operations.

Share this post


Link to post
Share on other sites
Its not that i want to write inline assembly code - i rather use intrinsics if i have to.

But i wanted to understand what happens when you actually writes inline assembly code,
how the register handling works and how it differs from code generated by compiler.

In addition there are languages which do not have any intrinsics at all: Object pascal / Delphi.
In this language you absolutly have to use inline assembly. If i remember there is not even a "rdtsc" function intrinsics in delphi...

Just look at this codes one guy i know has written, its insane how much assembly there is but its pretty interesting as well.
https://github.com/BeRo1985/kraft/blob/master/src/kraft.pas
https://github.com/BeRo1985/pasmp/blob/master/src/PasMP.pas

So assembly have still its merits, especially when required to support special instructions on other platforms. Edited by Finalspace

Share this post


Link to post
Share on other sites

In addition there are languages which do not have any intrinsics at all: Object pascal / Delphi. In this language you absolutly have to use inline assembly. If i remember there is not even a "rdtsc" function intrinsics in delphi...

Then don't use those languages?  Or at least, don't use them to in problem domains where intrinsics are the right solution.

Not all languages cover all use cases. I wouldn't write a web application in C++ and I wouldn't write a AAA game engine in PHP (actually I wouldn't write anything in PHP, but ya know...) 

Share this post


Link to post
Share on other sites

In addition there are languages which do not have any intrinsics at all: Object pascal / Delphi.
In this language you absolutly have to use inline assembly. If i remember there is not even a "rdtsc" function intrinsics in delphi....

You don't absolutely have to use asm. It would be far preferable to call out to a C function instead of an inline asm block (and delphi can call C functions).

Share this post


Link to post
Share on other sites

Virtually every language out there today can use C bindings and link with the proper object files or dynamic libraries.

If you need something in another language, by all means do it.  Compile to something you can link with, and link them together.

I think we've got six languages in my current project, all living happily in the same final executable. 

Typically the overhead cost is only that of a function call which is almost nothing at all.  Sometimes there is a cost of data marshaling, but that too is generally minimal. 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this