Jump to content

  • Log In with Google      Sign In   
  • Create Account


#ActualHodgman

Posted 07 December 2012 - 08:39 PM

Is the following C++ pseudocode (assuming C++03) bad/evil/dangerous? ... Instruction reordering isn't bad in this example

That's probably the most common/acceptable use of volatile -- telling the compiler that it should definitely read that boolean each iteration, instead of optimising it to a single read -- especially in cases where reordering isn't a concern.
This is only something that works in practice though, and is reliant on assumptions about your hardware. There's no requirement in C++03 that when one thread writes a value of 'true' to the boolean, that this value will ever become visible to other threads, volatile or not.

since reads/writes are atomic

This is another hardware-specific detail, not specified by C++03.

And what C++11 data types would be appropriate here?

In your case, atomic and memory_order_relaxed. Deciding that locks are too slow though is an optimisation issue.

One thing that bugs me is that MSVC's volatile has acted like C++11's atomic (with memory_order_seq_cst) since VS2005 -- i.e. on x86, it uses cmpxchg-type instructions. This is because too many people wrote bad volatile-based code that should be wrong due to re-ordering issues, so Microsoft changed the meaning of volatile to include a full memory fence (no read/write can be reordered past a volatile read/write), to fix people's buggy code, which just encourages people to write more buggy code that will break on other C++ compilers...

With your loop, using MSVC's volatile or C++11's atomic, you get one fully-fenced read per iteration. Using locks, and assuming no contention (Posted Image) you get a fenced read, a regular read, and a fenced write per iteration, which isn't much different. Taking contention into account though, you also might get a busy-wait with repeated fenced reads, and possibly a context-switch.
Aside from these performance differences, there's sometimes theoretical reasons to want a particular kind of non-blocking guarantee, which is a better reason to avoid locks. N.B. some kinds of lock-free systems will have worse performance than locking ones, but do so because they require the guarantee for whatever reason.

almost all of those volatile overloads I mentioned are for the C++11 threading library

Are there any valid use-cases for volatile in multi-threaded code, aside from ones like the above that can be replaced with atomic?

#2Hodgman

Posted 07 December 2012 - 08:36 PM

Is the following C++ pseudocode (assuming C++03) bad/evil/dangerous? ... Instruction reordering isn't bad in this example

That's probably the most common/acceptable use of volatile -- telling the compiler that it should definitely read that boolean each iteration, instead of optimising it to a single read -- especially in cases where reordering isn't a concern.
This is only something that works in practice though, and is reliant on assumptions about your hardware. There's no requirement in C++03 that when one thread writes a value of 'true' to the boolean, that this value will ever become visible to other threads, volatile or not.

since reads/writes are atomic

This is another hardware-specific detail, not specified by C++03.

And what C++11 data types would be appropriate here?

In your case, atomic and memory_order_relaxed. Deciding that locks are too slow though is an optimisation issue.

One thing that bugs me is that MSVC's volatile has acted like C++11's atomic since VS2005 -- i.e. on x86, it uses cmpxchg-type instructions. This is because too many people wrote bad volatile-based code that should be wrong due to re-ordering issues, so Microsoft changed the meaning of volatile to include a full memory fence (no read/write can be reordered past a volatile read/write), to fix people's buggy code, which just encourages people to write more buggy code that will break on other C++ compilers...

With your loop, using MSVC's volatile or C++11's atomic, you get one fully-fenced read per iteration. Using locks, and assuming no contention (Posted Image) you get a fenced read, a regular read, and a fenced write per iteration, which isn't much different. Taking contention into account though, you also might get a busy-wait with repeated fenced reads, and possibly a context-switch.
Aside from these performance differences, there's sometimes theoretical reasons to want a particular kind of non-blocking guarantee, which is a better reason to avoid locks. N.B. some kinds of lock-free systems will have worse performance than locking ones, but do so because they require the guarantee for whatever reason.

#1Hodgman

Posted 07 December 2012 - 08:33 PM

Is the following C++ pseudocode (assuming C++03) bad/evil/dangerous? ... Instruction reordering isn't bad in this example

That's probably the most common/acceptable use of volatile -- telling the compiler that it should definitely read that boolean each iteration, instead of optimising it to a single read -- especially in cases where reordering isn't a concern.
This is only something that works in practice though, and is reliant on assumptions about your hardware. There's no requirement in C++03 that when one thread writes a value of 'true' to the boolean, that this value will ever become visible to other threads, volatile or not.

since reads/writes are atomic

This is another hardware-specific detail, not specified by C++03.

And what C++11 data types would be appropriate here?

In the case of a shared variable without locks, atomic. Deciding that locks are too slow though is an optimisation issue.

One thing that bugs me is that MSVC's volatile has acted like C++11's atomic since VS2005 -- i.e. on x86, it uses cmpxchg-type instructions. This is because too many people wrote bad volatile-based code that should be wrong due to re-ordering issues, so Microsoft changed the meaning of volatile to include a full memory fence (no read/write can be reordered past a volatile read/write), to fix people's buggy code, which just encourages people to write more buggy code that will break on other C++ compilers...

With your loop, using MSVC's volatile or C++11's atomic, you get one fully-fenced read per iteration. Using locks, and assuming no contention (Posted Image) you get a fenced read, a regular read, and a fenced write per iteration, which isn't much different. Taking contention into account though, you also might get a busy-wait with repeated fenced reads, and possibly a context-switch.
Aside from these performance differences, there's sometimes theoretical reasons to want a particular kind of non-blocking guarantee, which is a better reason to avoid locks. N.B. some kinds of lock-free systems will have worse performance than locking ones, but do so because they require the guarantee for whatever reason.

PARTNERS