inline void swap(int& a, int& b){ asm { mov eax, a mov ebx, b mov b, eax mov a, ebx };}
You'll notice that the first two instructions are completely independent. This means that the processor can roughly do them at the same time. The second two instructions are also completely independent. They can be done at roughly the same time as well. The second two obviously have to wait for the first two to complete of course, but each pair can occur simultaneously. Modern processors have been doing this for quite a while, and they and compilers get better at it all the time.
The XOR method does not allow this. The second operation needs to wait for the first operation to complete before it can start. The third operation also needs to wait for the second operation to complete before it can start. All this waiting means that a lot of the CPU is sitting there unused.
Graphically, the first one would look like this:
[**** OPERATION 1 ****] [**** OPERATION 3 ****] [**** OPERATION 2 ****] [**** OPERATION 4 ****]
The second one (XOR) would look more like this:
[**** OPERATION 1 ****][**** OPERATION 2 ****][**** OPERATION 3 ****]
So you can see how the XOR ends up taking longer, even though it has one less operation.