So what about vista or windows 7 with directx 10/11 where flip sequential might not be available or using say discard as the presentation model?
It will use an additional BitBlit() to copy the DX surface to an intermediate DWM surface, which is slower. But same principal holds - if the swap-chain format matches the screen-format, this operation will be faster. When using the BitBlit mode, you also have more options for swap-chain formats.
This is not completely true for windows 7. As long your backbuffer matches your front buffer and you have your swap chain set to full screen. It will not bitblt, but swap the backbuffer with the front. The only time it bitblt is when the swapchain is in window mode or when the swap chain back buffer does not match the front buffer width/heigh and format.