x86 instructions movs,cmps,...

Started by
2 comments, last by frob 12 years, 9 months ago
Im curious as to if the following x86 instructions movs, cmps, scas, stos, and lods truly get compiled as that instruction with an assembler..or do they get broken down into more then one instruction to equal those?

for example, if i compiled a C++ program using strings will i ever encounter the following instructions(movs, cmps, scas, stos, and lods) in the disassembly?
Advertisement

for example, if i compiled a C++ program using strings will i ever encounter the following instructions(movs, cmps, scas, stos, and lods) in the disassembly?


Maybe.

They don't really have much to do with strings. A string has literal meaning here - a string or sequence of bytes.

These instructions are somewhat a legacy of the CISC (rich) assembly. They were assembly version of for loops, a very high-level construct. Part of reasoning was that by having such high level knowledge, CPU could execute it better, knowing it's a for loop. But as far as big picture goes, this never panned out.

As branch prediction and a whole lot of other architecture changed and advanced, and with compilers becoming the norm, they lost their original meaning and became somewhat limiting. These days, just about all strings are unicode and may require conditional per-byte processing, so the instructions above are not suitable anymore. And streaming processing is in the domain of SIMD or pipeline-tuned code.

I remember that even back in the 386 era, memcpy stopped using these instructions in favor of trickier approaches which ran faster.

Out-of-order execution also doesn't favor these. Since they require specific registers as parameters, it interferes with optimal register allocation.

Im curious as to if the following x86 instructions movs, cmps, scas, stos, and lods truly get compiled as that instruction with an assembler..or do they get broken down into more then one instruction to equal those?

for example, if i compiled a C++ program using strings will i ever encounter the following instructions(movs, cmps, scas, stos, and lods) in the disassembly?



Yes, those are real x86 instructions (they have one-byte encodings in the A4 through AF range). The REPNZ (F2) and REPZ (F3) prefixes can also be used with them.

A good online reference to instructions is x86asm.net's Instruction Table

Generally, if you see something on that table, it's a real instruction or a prefix of some kind. Something that has a real representation in machine code, anyway.


I have personally seen scas, movs, and stos used when a compiler inlines the strlen, strcpy, memcpy, and memset functions, so yes, C and/or C++ compilers will use them sometimes.

Im curious as to if the following x86 instructions movs, cmps, scas, stos, and lods truly get compiled as that instruction with an assembler..or do they get broken down into more then one instruction to equal those?

for example, if i compiled a C++ program using strings will i ever encounter the following instructions(movs, cmps, scas, stos, and lods) in the disassembly?

That is entirely up to the compiler writers.

It is their job to know the inner details of the hardware, to know that one pattern executes faster or slower than another. The optimizing compiler may even generate code for multiple ways to run a block of code, then calculate the processing time of each, then select the best option in that specific instance.



There is a very easy way to check it:

Compile your program in a release/final optimized settings.

Open it in the debugger.

Switch to a disassembly view.

Look at the actual instructions that were generated.

This topic is closed to new replies.

Advertisement