• Create Account

We need 1 more developer from Canada and 12 more from Australia to help us complete a research survey.

Support our site by taking a quick sponsored survey and win a chance at a \$50 Amazon gift card. Click here to get started!

### #ActualKing Mir

Posted 04 November 2013 - 12:14 PM

@L. Spiro
That applies to many architectures, but it doesn't apply to x86. In x86 any MOV from memory by itself has acquire semantics. Any MOV into memory has release semantics. You don't need a fence. You only need a fence to implement sequential consistency, usually placed after each store, but with two threads, that's not needed, because the difference between acquire-release semantics and sequential consistency is only apparent with more threads.

So you're correct from the standpoint of writing portable C++, but on x86 the code will work as is. Because of the above, x86 will allow code that is technically undefined C++ to work as one would expect. (Let me reiterate that: the code as written is undefined behavior)

That's an overly simplification.
Section 8.2.2 from the Intel Intel 64 and IA-32 Architectures Software Developer’s Manual is very clear that:
"Reads may be reordered with older writes to different locations but not with older writes to the same location."
Here's even an example of how such reordering can affect the logic of an application even when just running 2 threads. Whether this issue affects the OP's code, can only be said by inspecting the assembly.

No, the code you linked to can be analyzed in a way that doesn't require looking at the disassebly to spot the problem, and so can the OP. In the op it's trivial to see that each access to the lock is an acquire-read on one variable followed by some critical section code, followed by a release-write on the same variable. That pattern is tried and true.

zeroWants = true; //release on zeroWants
victim = 0; //release on victim
while (oneWants && victim == 0) // aquire on  oneWants and victim
continue;
// critical code

It's a release followed by acquire, which is nothing. The acquire by itself would be usable to create a lock, but clearly here this lock does not work if you remove the first two lines.

But conceded, that does show that acquire-release semantics can differ from sequential consistency even with two threads. Thank you for the example.

### #1King Mir

Posted 04 November 2013 - 12:14 PM

@L. Spiro
That applies to many architectures, but it doesn't apply to x86. In x86 any MOV from memory by itself has acquire semantics. Any MOV into memory has release semantics. You don't need a fence. You only need a fence to implement sequential consistency, usually placed after each store, but with two threads, that's not needed, because the difference between acquire-release semantics and sequential consistency is only apparent with more threads.

So you're correct from the standpoint of writing portable C++, but on x86 the code will work as is. Because of the above, x86 will allow code that is technically undefined C++ to work as one would expect. (Let me reiterate that: the code as written is undefined behavior)

That's an overly simplification.
Section 8.2.2 from the Intel Intel 64 and IA-32 Architectures Software Developer’s Manual is very clear that:
"Reads may be reordered with older writes to different locations but not with older writes to the same location."
Here's even an example of how such reordering can affect the logic of an application even when just running 2 threads. Whether this issue affects the OP's code, can only be said by inspecting the assembly.

No, the code you linked to can be analyzed in a way that doesn't require looking at the disassebly to spot the problem, and so can the OP. In the op it's trivial to see that each access to the lock is an acquire-read on one variable followed by some critical section code, followed by a release-write on the same variable. That pattern is tried and true.

zeroWants = true; //release on zeroWants
// critical code