What is the difference between % and %%

Started by
4 comments, last by xinsan 15 years, 10 months ago
I am confused with % and %% during writing inline assembly. asm volatile("mov %eax, %ecx"); asm volatile("mov %%eax, %%ecx::"a"(n) "); Why sometimes the note "%" is used but sometimes "%%" is used? Another question is: Where is the good forum for x86 inline assembly? Thank everyone who would like to help me.
Advertisement
One is dereference, the other is double-dereference.



Also, you generally should not use inline assembly, especially for such small blocks of code.

There is a huge performance penalty for using inline assembly: It forces the code generator to restore values to a particular "known" state. It disables the optimizer for that section of code (because you are stating that you know better). Then it runs your code. Next, it re-enables the optimizer, and then potentially reverts to its previous state.

Your two lines of code potentially does all of that work twice, and doesn't do anything more than: "*x = *y;" for the first one, and assign to **x from a virtual function for the second one. (It looks like it is just calling a member function and assigning the result to a member variable)

Inline assembly should generally be using regular language code and compiler intrinsics instead, whenever possible. Those do not incur the performance penalties.
When you use extended assembly you need to differentiate between operands (C/C++ variables you want to reference within your assembly block) and registers. Operands are prefixed with %, registers with %%. When you're not using operands in an asm() block it doesn't matter which you use.
Thanks for help. It is true that inline assembly may spend more time than C language. But what I really want to do is to learn how to use inline assembly language. Next, I want to grasp the methods about how to optimize loop code using SIMD instructions(SSE/SSE2/SSE3) which is inserted by inline assembly. All the work discussed above is for optimize a math library--hypre. How to optimize the hypre, specially BLAS library. I ever learn miniL1BLAS library has implemented Level 1 BLAS through inline assembly which achieved good performance boost by 30%. I want to learn its method to optimize other BLAS function and even hypre library.
Quote:Original post by xinsan
I want to grasp the methods about how to optimize loop code using SIMD instructions(SSE/SSE2/SSE3) which is inserted by inline assembly.
Use SIMD intrinsics. They are not inserted by inline assembly (meaning no performance penalty), and they are optimized by the compiler (meaning a performance benefit).

The people who write compiler optimizers know their stuff. They reorder the instructions to generate the fastest microcode. They use many cool tricks to ensure the code passes quickly and easily through the pipeline, such as reordering so multiple instructions can be decoded at a time, choosing better instructions that generate fewer microcode steps, finding ways to use bigger coarse instructions that generate better internal microcode for the OO core, and so on. Learn to trust the compiler writers, and pretend you haven't heard of inline assembly unless you have absolute hard evidence that you can do better.
Quote:Original post by frob

The people who write compiler optimizers know their stuff. They reorder the instructions to generate the fastest microcode. They use many cool tricks to ensure the code passes quickly and easily through the pipeline, such as reordering so multiple instructions can be decoded at a time, choosing better instructions that generate fewer microcode steps, finding ways to use bigger coarse instructions that generate better internal microcode for the OO core, and so on.


Yes, you are right. It is true that it's hard to obtain good performance directly using inline assembly. But it is really what my work is. Of course, now I am a beginner. There are many things for me to do. Glad to discuss with you.

This topic is closed to new replies.

Advertisement