I've got a bunch of topics to write about queued up, so check back every once in a while if you're interested.
Jump to content
movapd movaps movdqa *** movupd movups movdqu *** lddquAll of these instructions move 128-bits worth of data. Breaking it down further, the first three instructions work with aligned data, whereas the next three are the unaligned versions of the first (we'll talk about the last one in a minute, since it's a bit special). The aligned versions offer better performance, but if you haven't ensured that your data is allocated on a 16-byte boundary, you'll have to use one of the unaligned instructions in order to load. When doing register-to-register (reg-reg) moves, it's best to use the aligned versions.
movntdqa *** movntdq movntpd movntpsThese are the non-temporal loads and stores, so named since they hint to the processor that they are one-off in the current block of code and should not require bringing the associated memory into the cache. Thus, you should only use these when you're sure that you won't be doing more than one read or write into the given cache line. The first instruction, movntdqa, is the only non-temporal load, so it's what you have to use even when loading floating point data. The other three are data-specific stores from an XMM register into memory, one each for integers, doubles, and singles. All of these instructions only operate on aligned addresses; there are no unaligned non-temporal moves
movd / movq movss / movsd *** movlps / movlpd movhps / movhpdThe first instruction in each pair listed above operates on singles (ie. 32 bits of data) and the second works on doubles, which is 64 bits of data. The first set, comprising the first four instructions, generally fill the extra bits in the XMM register with zero. The second set does not; it leaves them as they are. I'll discuss in a moment why this is not necessarily a good thing. movd moves 32 bits between memory and a register. It cannot, however, move between two XMM registers, which is an oddity that the rest of the instructions listed here do not share. movq will always zero extend during any move, including between memory and between registers. movd and movq are meant for integer data.