*except on some MS compilers, where on x86, volatile reads/writes are generated using the LOCK instruction prefix
MS doesn't use the LOCK instruction for volatile reads and writes. LOCK would provide sequential consistency, but MS volatile only guarantees acquire/release. On x86, reads naturally have acquire semantics and writes naturally have release semantics (assuming they're not non-temporal). The MS volatile just ensures that the compiler doesn't re-order or optimize out instructions in a way that would violate the acquire/release semantics.
Yeah, you're right - I've struck that bit out in my above post. From memory I thought that they'd gone as far as basically using their Interlocked* intrinsics silently on volatile integers, but it's a lot weaker than that. I even just gave it a go in my compiler and couldn't get it to emit a LOCK prefix except when calling InterlockedCompareExchange/InterlockedIncrement manually
This means that even with MS's stricter form of volatile, it would be very hard to use them to write correct inter-thread synchronization (i.e. you should still only see them deep in the guts of synchronization primitives, and not in user code).
As a general note involving the volatiles, I also went and did a test for fun. I took the scheduler for my distribution system and added a single volatile to the head index of the lazy ring buffer. I changed nothing else, I'm still using explicit atomic load/store to access it. It slowed down the loop by about 10%. That's quite a bit worse than my worst guess. This was on a dual Xeon and compiled by Clang, I'd be terrified to see what happens with the MS hackery on volatiles. As a note: there is an option in VC2015 to disable the MS specific behavior now I believe, so it may not be any worse than Clang with that set.
As to volatiles and threading in general, I don't believe I use the keyword volatile anywhere in my code, both home and work, and it is fully multi-core from the ground up. Unlike what I called out above, I'm not using it just to ship, it is a fundamental design goal of the overall architecture.