Is the following C++ pseudocode (assuming C++03) bad/evil/dangerous? ... Instruction reordering isn't bad in this exampleThat's probably the most common/acceptable use of volatile -- telling the compiler that it should definitely read that boolean each iteration, instead of optimising it to a single read -- especially in cases where reordering isn't a concern.
This is only something that works in practice though, and is reliant on assumptions about your hardware. There's no requirement in C++03 that when one thread writes a value of 'true' to the boolean, that this value will ever become visible to other threads, volatile or not.
since reads/writes are atomicThis is another hardware-specific detail, not specified by C++03.
And what C++11 data types would be appropriate here?In your case, [font=courier new,courier,monospace]atomic[/font] and [font=courier new,courier,monospace]memory_order_relaxed[/font]. Deciding that locks are too slow though is an optimisation issue.
One thing that bugs me is that MSVC's [font=courier new,courier,monospace]volatile[/font] has acted like C++11's [font=courier new,courier,monospace]atomic[/font] (with [font=courier new,courier,monospace]memory_order_seq_cst[/font]) since VS2005 -- i.e. on x86, it uses [font=courier new,courier,monospace]cmpxchg[/font]-type instructions. This is because too many people wrote bad [font=courier new,courier,monospace]volatile[/font]-based code that should be wrong due to re-ordering issues, so Microsoft changed the meaning of [font=courier new,courier,monospace]volatile[/font] to include a full memory fence (no read/write can be reordered past a volatile read/write), to fix people's buggy code, which just encourages people to write more buggy code that will break on other C++ compilers...
With your loop, using MSVC's [font=courier new,courier,monospace]volatile[/font] or C++11's [font=courier new,courier,monospace]atomic[/font], you get one fully-fenced read per iteration. Using locks, and assuming no contention () you get a fenced read, a regular read, and a fenced write per iteration, which isn't much different. Taking contention into account though, you also might get a busy-wait with repeated fenced reads, and possibly a context-switch.
Aside from these performance differences, there's sometimes theoretical reasons to want a particular kind of non-blocking guarantee, which is a better reason to avoid locks. N.B. some kinds of lock-free systems will have worse performance than locking ones, but do so because they require the guarantee for whatever reason.
almost all of those volatile overloads I mentioned are for the C++11 threading libraryAre there any valid use-cases for [font=courier new,courier,monospace]volatile[/font] in multi-threaded code, aside from ones like the above that can be replaced with [font=courier new,courier,monospace]atomic[/font]?